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1.  Introduction 


The  process  of  locating  and  identifying  significant  changes  or  new  activities,  known  as  change 
detection  (CD),  is  one  of  the  most  important  imagery  exploitation  tasks  [5].  Previous  re¬ 
search  on  CD  has  emphasized  the  development  of  general-purpose  methods  that  can  be 
employed  to  screen  a  wide  variety  of  imagery  and  determine,  without  access  to  any  site- 
specific  model  information,  whether  any  significant  changes  or  events  have  occurred  between 
the  times  of  acquisition  of  the  imagery.  These  methods  have  been  found  to  be  unreliable 
for  two  reasons:  First,  CD  techniques  based  on  more  or  less  sophisticated  differencing  of 
images  (possibly  after  attempted  corrections  for  viewpoint  and  illumination  differences)  are 
extremely  sensitive  to  errors  in  registration  and  in  the  photometric  models  (e.g.  reflectance, 
illumination)  that  are  used.  Second,  too  many  inconsequential  changes  occur  in  any  natu¬ 
ral  environment.  Even  if  general-purpose  methods  could  be  developed  for  screening  out  all 
changes  due  to  variations  in  viewpoint,  sensor  and  illmnination,  there  woiild  still  be  many 
differences  between  the  images  whose  significance  could  only  be  determined  by  an  image 
analyst  (lA)  using  comprehensive  site  knowledge  and  the  relevant  intelligence  agenda.  Thus 
the  goal  of  relieving  the  lA  of  the  burden  of  screening  large  subsets  of  acquired  imagery  is 
unlikely  to  be  achieved  using  such  general-purpose  methods. 

We  plan,  instead,  to  develop  a  model-based  vision  system  for  CD,  incorporating  image 
understanding  (lU)  techniques  whose  primitives  are  specific  to  a  particular  site  type.  The 
system  can  be  employed  by  the  lA  to  use  the  lU  techniques  to  conduct  spatially  constrained 
analyses  whose  outcomes  may  be  indicative  of  occurrences  of  changes  that  have  intelligence 
significance.  The  system  is  site  model  driven  and  will  be  based  on  three  classes  of  primitives: 
object  primitives,  which  correspond  to  the  specific  objects  that  occur  in  a  particular  site 
model  and  to  the  generic  object  classes  supported  by  the  lU  system;  spatial  primitives, 
for  the  construction  of  search  locales  and  the  specification  of  constraints  on  the  se2irch  for 
object  types  within  locales;  and  temporal  primitives,  which  can  constrain  or  parameterize 
the  analysis  by  factors  such  as  time  of  day,  day  of  week,  time  of  year,  etc.  The  system  will 
assist  the  lA  by  highlighting  areas  on  an  image  where  there  axe  relevant  activities,  new  or 
upgraded  facilities. 

As  reported  in  [5],  lAs  have  identified  two  ways  in  which  lU  can  be  useful  in  CD:  the 
“quick-look”  (QL)  and  “final-look”  (FL)  modes.  In  the  QL  mode,  small  areas  where  any 
change  would  be  considered  significant  are  declared  a  priori,  and  when  the  system  is  presented 
with  a  series  of  images,  only  those  that  satisfy  the  conditions  in  the  QL  profile  are  marked. 
In  the  FL  mode,  a  set  of  less  important  areas  to  be  exaunined  for  change  is  specified.  These 
areas  are  less  important,  but  the  I A  wants  to  examine  them  to  ensure  complete  coverage  of 
the  site.  As  the  lA  gains  experience,  both  the  QL  and  FL  profiles  can  be  modified.  The  CD 
system  that  we  plan  to  build  will  primarily  be  guided  by  QL  profiles. 

The  site  models  considered  in  the  current  phase  of  RADIUS  encode  only  the  spatial 
relationships  between  fixed  objects  of  interest  in  a  site,  such  as  buildings,  roawis,  etc.  An 
important  issue  in  training  new  analysts  or  reviewing  infrequently  analyzed  sites  is  the 
coding  of  the  temporal  relationships  which  describe  changes  in  the  site  such  as  movements 
of  vehicles  imder  normal  or  abnormal  circumstances — i.e.,  a  site  activity  model.  The  CD 
system  described  above  will  be  a  valuable  step  toward  the  development  of  a  site  activity 
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modeling  capability. 

Generally  the  first  step  in  a  CD  task  is  the  registration  of  an  image  to  an  existing  site 
model.  Depending  on  the  CD  task,  using  the  existing  site  model  and  camera  parameters, 
regions  of  interest  in  the  given  image  can  be  delineated.  Subsequently,  objects  such  as 
buildings  and  vehicles  that  are  characteristically  present  in  the  site  can  be  extracted  and 
analyzed  for  CD  purpose.  Such  object  extraction  algorithms  cannot  be  purely  bottom-up. 
For  example,  in  extracting  buildings  [13],  heuristics  based  on  the  expected  shapes  of  roofs 
(site-specific  information)  are  very  useful  for  completing  any  partial  roof  hypotheses  that 
result  from  imperfect  bottom-up  processing.  Likewise,  shadow  analysis  is  very  useful  for 
obtaining  height  information  [6,  7],  or  allowing  the  lU  system  to  explain  why  some  building 
features  that  axe  in  the  field  of  view  cannot  be  identified  in  the  image.  Site  models  can  also 
be  very  useful  for  providing  geometric  and  photometric  constraints  that  reduce  matching 
ambiguities. 

In  addition  to  irntge-to-site-model  registration,  we  are  also  interested  in  image-to-image 
registration  where  two  images  acquired  from  possibly  severe  off-nadir  viewing  conditions 
need  to  be  registered  prior  to  performing  change  detection.  Image-to-image  registration  is 
useful  for  building  site  models,  for  developing  automatic  image-to-site  model  registration 
algorithms,  and  for  performing  the  subtask  of  transforming  a  given  image  to  a  “favored 
orientation”  [5].  The  images  to  be  analyzed  as  part  of  the  RADIUS-related  research  pro¬ 
gram!  axe  high-resolution  images  of  complicated  sites.  In  many  of  the  currently  used  image 
registration  algorithms,  tie  points  need  to  be  manually  selected.  This  can  be  a  laborious 
task.  Automatic  registration  of  the  two  images  is  desirable.  Given  the  variability  of  viewing 
directions,  illumination  conditions  and  resolution,  the  features  used  for  matching  may  be 
poorly  localized  or  occluded.  Automatic  image-to-image  registration  is  accomplished  using 
appropriate  cues  from  site  models  and  camera  models. 

It  is  evident  that  the  lA  must  perform  a  crucial  role  in  directing,  manipulating  and  cor¬ 
recting  the  results  of  lU  algorithms.  An  important  p>art  of  our  approach  is  the  inclusion  of 
early  feedback,  by  users  familiar  with  the  final  application,  as  to  the  usability  of  the  algo¬ 
rithms  developed  under  this  program.  These  evaluations  will  provide  valuable  information 
with  respect  to  the  likely  models  and  levels  of  interaction  to  be  expected  from  I  As,  the  clarity 
and  intuitive  understandability  of  the  lU  algorithms,  and  whether  the  typical  lA  is  able  to 
tailor  the  responses  of  the  algorithm  to  his/her  needs. 

2.  Research  Areas 

2.1.  Site  Model  Supported  Monitoring 

Our  approach  to  image  monitoring  is  based  on  the  idea  of  QL  profiles.  QL  profiles  axe  the 
image  exploitation  recipes  constructed  by  an  lA  for  a  given  site;  they  characterize  changes 
that  axe  significant  to  the  site.  The  tasks  in  a  QL  profile  axe  related  to  each  other  both 
temporally  and  spatially.  For  example,  if  the  interest  of  the  lA  in  a  given  site  concerns 
military  activity,  the  first  task  to  be  performed  depends  on  previous  knowledge  about  the 
site  (if  it  exists).  If  the  reports  from  previous  analyses  indicate  that  armament  was  present 
in  a  training  ground,  the  QL  profile  will  call  for  vehicle  detection  in  the  training  ground  first. 
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K  there  are  still  many  vehicles  in  the  training  ground,  the  QL  profile  will  report  Hhe  exercise 
continues”  and  call  for  a  vehicle  pattern  analysis  task.  On  the  other  hand,  if  the  first  task 
reports  that  there  are  almost  no  vehicles  in  the  training  ground,  then  the  QL  profile  will 
trigger  vehicle  detection  on  the  roads  and  in  the  garage  area.  If  many  vehicles  are  found  in 
the  garage  area,  then  the  report  “the  training  is  finished  and  the  armament  is  back  in  the 
camp”  is  sent.  If  the  vehicles  are  not  in  the  garage  area,  the  QL  profile  will  trigger  pattern 
analysis  on  the  road,  send  a  report  about  the  heading  of  the  formation,  etc.  In  another 
situation,  if  no  previous  information  about  troop  formation  is  available  or  the  reports  from 
previous  image  ansdyses  indicate  that  the  armament  was  in  the  garage  area,  the  first  task 
to  be  called  from  the  QL  profile  will  be  vehicle  detection  in  the  garage  area;  based  on  its 
results,  further  analysis  of  the  road  and  the  training  ground  may  be  called  for. 

In  a  typical  site  model  supported  monitoring  task,  given  a  new  image,  we  first  register 
the  new  image  to  the  site  model  or  the  old  images  in  the  existing  site  folder.  We  then 
delineate  the  regions  of  interest  according  to  the  task.  Next,  2-D  templates  of  the  objects 
to  be  monitored  are  formed  based  on  their  3-D  structure  and  information  from  the  site 
model.  Primitive  features  such  as  circles,  ellipses,  rectangles,  and  parallel  lines  are  extracted, 
grouped  and  compared  to  the  templates  of  the  objects.  Candidates  with  sufficient  high 
scores  of  consistency  with  the  object  templates  are  further  verified  and  reported  to  the  lA. 
Figure  1  shows  a  general  flowchart  of  our  image  monitoring  system.  For  different  monitoring 
tasks,  the  2-D  object  models,  primitive  features  to  be  extracted,  and  grouping  mechanism 
are  defined  differently.  For  example,  for  vehicle  detection  from  aerial  images,  a  vehicle  can 
be  modeled  as  a  rectangle  of  a  certain  size  oriented  along  the  road  line.  For  detection  of 
activities  such  as  construction  of  chimneys,  the  needed  model  (for  a  cylindrical  object)  is  a 
little  more  complicated.  It  should  have  an  ellipse  on  top  of  two  parallel  lines;  the  minor  axis 
of  the  ellipse  should  be  parallel  to  the  two  supporting  lines,  which  in  turn  are  pwallel  to  the 
camera  viewing  direction. 

We  have  developed  a  preliminary  design  of  an  lU  system  for  monitoring  aerial  images. 
The  system  is  guided  by  an  underlying  site  model,  and  by  available  knowledge  of  acquisi¬ 
tion  and  illumination  parameters,  and  performs  task-specific  image  analyses  for  answering 
possible  queries  from  an  lA.  We  plan  to  extend  the  capabilities  of  our  system  by  integrating 
collatercil  information  about  the  various  objects  in  the  site,  a  user  interface,  and  a  more 
comprehensive  set  of  QL  profiles.  Extensions  to  images  acquired  by  synthetic  aperture  radar 
are  also  planned. 

2.2.  Registration  Algorithms 

We  are  investigating  two  types  of  registration  processes,  image- to-site-model  registration  and 
image- to-image  registration.  Depending  on  the  particular  CD  task,  e.g.,  if  building  or  vehicle 
related  activity  is  being  monitored,  we  can  use  the  site  model  and  viewing  direction  of  the 
new  image  to  identify  regions  in  the  image  that  need  further  analysis.  We  can  subsequently 
invoke  the  necessary  lU  algorithms  related  to  detection  of  construction  activities,  vehicle 
location  and  counting  (and  roar!  extraction,  if  construction  of  roads  is  monitored).  For  tasks 
such  as  these,  the  newly  acquired  image  needs  to  be  registered  to  the  existing  site  model 
prior  to  any  CD  task. 
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Figure  1:  A  block  diagram  of  the  image  momtoring  system. 
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We  have  developed  an  image-to-site-model  registration  procedure  which  first  transfers  the 
given  approximate  camera  model  to  RCDE  format,  then  requires  the  lA  to  manually  adjust 
the  locations  of  some  points  whose  3-D  coordinates  are  known,  and  finally  uses  the  RCDE 
camera  resection  function  to  get  an  accurate  camera  model  for  the  newly  acquired  image. 
Our  final  goal  in  image-to-site-model  registration  is  to  make  the  process  totally  automatic. 
Both  the  selection  of  control  points  and  the  search  for  their  matches  in  the  newly  acquired 
image  will  be  performed  automatically. 

In  addition  to  image-to-site-model  registration,  which  will  be  directly  useful  for  CD, 
we  are  also  developing  a  general-purpose  image-to-image  registration  algorithm.  Such  an 
algorithm  will  be  useful  for  building  site  models,  orienting  an  image  in  a  “favored  position” , 
and  delineating  regions  of  interest.  The  traditional  stereo  paradigm  [14]  for  inferring  3-D 
structure  is  not  applicable  to  images  acquired  from  severe  off-nadir  viewing  directions.  Our 
goal  is  to  develop  a  completely  automatic  registration  algorithm  using  site  models  and  ?’'y 
auxiliary  information  such  as  camera  parameters.  Site  models  will  be  useful  for  registering 
two  severely  off-nadir  images,  as  we  can  predict  the  contrasts  of  features  in  both  images, 
occlusions  of  features  and  shadow  regions. 

2.3.  Region  Delineation 

Region  delineation  is  an  important  step  for  outlining  the  regions  to  be  exploited  by  lU 
algorithms  and  providing  collateral  information  for  lU  algorithms.  Two  kinds  of  region 
delineation  are  useful  for  CD  tasks:  macro  region  delineation  and  micro  region  delineation. 
Macro  region  delineation  labels  the  regions  of  interest  to  the  lA,  hence  saving  computation 
by  not  monitoring  irrelevant  areas.  Two  methods  for  macro  region  delineation  have  been 
developed  in  our  system.  When  the  region  object  is  available  from  the  site  model,  we  directly 
project  the  region  boundaries  onto  the  image  to  be  monitored  and  label  the  region(s)  in  the 
image  domain.  When  the  region  of  interest  is  given  on  a  map  or  an  old  image,  we  use 
the  image-to-image  registration  to  transform  the  regions  of  interest  into  the  new  image. 
Both  methods  use  camera  model  information  available  from  the  site  model.  Micro  region 
delineation  further  labels  regions  of  occlusion  and  shadow  according  to  the  camera  model  and 
local  objects.  Consider,  for  example,  the  problem  of  identifying  the  region  in  an  aerial  image 
corresponding  to  a  given  parking  lot.  While  estimates  of  sensor  and  platform  parameters 
axe  known,  it  is  not  sufficient  to  simply  project  the  parking  lot  boundaries  onto  the  image 
plane  using  these  parameters,  since  these  parameters  axe  subject  to  errors.  Furthermore, 
determining  which  parts  of  the  parking  lot  are  visible  in  the  image  (since  parts  of  the  parking 
lot  can  be  occluded  by  other  objects  in  the  site)  and  the  illtimination  conditions  in  the  visible 
part  of  the  parking  lot  (parts  of  which  may  be  in  shadow  depending  on  sun  angle  and  site 
model  geometry)  are  criticeil  to  subsequently  making  a  correct  decision  as  to  whether  there 
is  a  significant  difference  between  the  numbers  of  observed  and  expected  vehicles  in  the 
parking  lot.  In  fact,  the  feasibility  of  performing  a  CD  task  depends  on  the  lU  system 
correctly  modeling  the  relationship  between  a  given  image  and  the  site  model  (for  example, 
if  we  were  interested  in  whether  a  large  number  of  vehicles  are  parked  near  a  certain  building, 
it  could  be  important  to  determine  if  that  part  of  the  parking  lot  is,  in  fact,  visible  in  the 
image).  We  have  working  algorithms  for  macro  region  delineation  and  will  develop  a  method 
for  micro  region  delineation  in  the  second  yew  of  the  project. 


5 


2.4.  Site  Model  Construction 

An  integral  component  of  site  model  based  registration  and  change  detection  is  the  avail¬ 
ability  of  site  models.  We  have  made  considerable  progress  on  site  model  construction  using 
RCDE.  We  are  working  on  updating  a  site  model  on  an  ongoing  basis.  The  solution  to 
site  model  construction  assumes  that  several  overlapping  coverage  images  are  available.  We 
have  constructed  a  site  model  for  model-board-2  images  using  RCDE.  The  recently  devel¬ 
oped  site  model- to-image  registration  algorithm  (detailed  in  Section  3.2.)  has  been  used  to 
register  model  board  2  images  to  the  site  model.  Using  the  model  supported  construction 
monitoring  algorithms  that  we  have  developed,  as  well  as  others  under  development,  we  will 
be  able  to  form  hypotheses  about  objects  in  the  site.  When  two  or  more  images  confirm  the 
same  hypotheses  about  the  underlying  object,  the  initial  assertions  about  the  object  will  be 
replaced  by  image-derived  assertions.  This  will  be  done  in  an  incremental  fashion.  During 
the  early  stages,  the  errors  due  to  incomplete  specification  of  site  models  may  be  handled  by 
allowing  more  tolerance  in  the  predicted  positions  of  features  and  their  computed  attributes. 
As  more  images  become  available,  the  representation  error  will  decrease. 

2.5.  Integration  of  RCDE 

Since  the  RADIUS  research  team  includes  several  institutions  to  enable  eflScient  sharing 
of  research  results  within  the  community  and  efficient  transfer  of  technology  to  lAs,  it  is 
required  that  all  developed  software  be  integrated  into  RCDE.  For  the  RADIUS  project,  a 
program  is  considered  as  being  integrated  into  RCDE  if  it  is  either  written  in  Lucid  Common 
Lisp  or  is  a  foreign  function  executable  from  RCDE  (preferably  through  an  online  menu). 
In  the  first  year  of  the  RADIUS  project,  we  have  been  one  of  the  RCDE  test  sites  and 
have  gained  considerable  experience  in  using  RCDE.  We  have  used  RCDE  to  build  a  site 
model  which  includes  all  the  images  for  model  board  2.  We  have  developed  a  method  for 
delineating  regions  of  interest  using  RCDE  basic  functions.  We  have  also  trjinsferred  some 
of  our  eilgorithms  into  RCDE  and  mcide  them  selectable  from  RCDE  menus.  In  addition,  the 
parameters  can  be  specified  through  the  RCDE  environment  and  the  results  are  represented 
as  RCDE  objects  which  can  be  easily  used  by  RCDE  functions.  Many  of  these  programs 
have  been  ported  to  the  Martin  Marietta  Group,  King  of  Prussia,  PA  and  tested  on  real 
images. 

3.  Accomplishments  to  Date 

During  the  first  year  under  the  contract,  we  have  made  considerable  progress  on  several 
fronts: 

1.  We  have  installed  RCDE  on  all  of  our  SPARC-10  systems  and  built  a  site  model  for 
the  model-board-2  images. 

2.  We  have  added  new  fimctions  into  RCDE. 

3.  We  have  developed  a  novel  image-to-image  registration  algorithm  that  can  automati- 
cadly  register  two  off-nadir  images,  when  no  information  about  the  camera  is  available. 
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4.  We  have  developed  a  simple  image-to-site-model  registration  mechanism  that  uses 
available  (approximate)  information  about  camera  parameters. 

5.  We  have  developed  image  delineation  algorithms  that  outline  regions  of  interest  useful 
for  change  detection  tasks. 

6.  We  have  developed  site-model-supported  change  detection  algorithms  and  illustrated 
them  for  monitoring  new  construction  and  detecting  and  counting  vehicles. 

7.  We  have  integrated  algorithms  for  image-to-site-model  registration,  image  delineation, 
and  monitoring  into  RCDE.  Many  of  these  algorithms  have  been  ported  to  the  Martin 
Marietta  Group,  King  of  Prussia,  PA. 

More  details  about  the  algorithms  and  experimental  results  obtained  on  model  board 
images,  as  well  as  real  images,  are  given  in  the  remainder  of  this  report. 

3.1.  Site  Model  Construction 

A  site  model  is  a  3-D  mathematical  representation  of  the  site  [1].  As  minimum  requirements, 
it  includes:  (a)  2-D  and  3-D  geometric  descriptions  of  site  features  such  as  areas,  buildings 
and  structures,  roads,  etc.  (b)  A  set  of  images  associated  with  the  site  and  their  imaging 
conditions  such  as  camera  position,  camera  orientation,  focal  length,  illuminant  direction, 
etc.  (c)  Object  attributes  such  as  name,  type,  and  status  (inactive,  under  construction,  etc.) 
associated  with  each  feature. 

We  use  the  following  procedure  to  build  a  site  model: 

1.  Display  two  or  more  input  images. 

2.  Create  a  default  world  coordinate  system. 

3.  Create  default  camera  models  for  the  input  images. 

4.  Manually  locate  (at  least)  four  control  points  for  each  input  image;  the  3-D  coordinates 
of  these  control  points  in  the  world  coordinate  system  zu:e  assumed  known. 

5.  Input  the  camera  focal  length  and  the  location  of  the  principal  point  (in  the  image 
plane)  for  each  input  image. 

6.  Do  camera  resection  [11]  to  get  the  correct  camera  model  for  each  input  image. 

7.  Add  objects  to  the  site  model  interactively,  using  object  templates  such  as  box,  cylin¬ 
der,  house,  and  their  compositions,  which  are  provided  in  RCDE. 

Figure  2  shows  an  example  of  building  a  new  site  model  for  model  board  2.  (a)  and  (b) 
show  two  input  images  (Ml  and  M2)  with  control  points  marked.  The  3-D  coordinates  and 
image  plane  indices  of  these  control  points  are  used  to  get  an  accurate  camera  model  for 
each  input  image.  After  the  new  images  are  registered  to  the  world  coordinates.  Figure  2(c) 
and  (d)  show  an  example  of  adding  a  building  object  to  the  site  model.  When  adding  a 
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(a)  Input  image- 1  (Ml) 


-  ir.  * 

% 


(b)  Input  image-2  (M2) 


Figure  2:  Building  a  site  model  using  RCDE. 
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(c)  A  new  object  shown  in  Ml 


(d)  A  new  object  shown  in  M2 


Figure  2:  (cont.)  Building  a  site  model  using  RCDE. 


(e)  A  partial  site  model  superimposed  on  Ml 


(f)  Rendering  of  the  site  model  on  M2 


Figure  2:  (cont.)  Building  a  site  model  using  RCDE. 


10 


new  object,  the  3-D  frame  of  the  object  is  displayed  in  all  the  images  in  the  site  model,  2md 
the  images  are  used  as  references  in  adjusting  the  size  and  orientation  of  the  3-D  object. 
Figure  2(e)  shows  a  partial  site  model  superimposed  on  Ml,  and  (f)  shows  a  rendering  of 
the  site  model  according  to  the  camera  model  for  M2. 

3.2.  Image-to-Site>Model  Registration 

In  order  to  use  information  from  a  site  model,  an  image  has  to  be  registered  first  to  the  site 
model.  To  register  2m  image  to  the  site  model,  we  first  need  to  rmderstand  and  unify  the 
camera  models.  In  many  image  exploitation  tasks,  the  camera  parameters  are  available  in 
terms  of  camera  position  and  orientation  in  a  world  coordinate  system,  while  the  camera 
model  used  in  photogrammetry  is  represented  by  the  conformal  transformation  [10]  in  which 
camera  parameters  are  represented  in  a  camera-centered  coordinate  system.  Here  we  present 
an  image-to-site-model  registration  algorithm.  Assuming  that  approximate  camera  param¬ 
eters  are  available  (as  in  the  RADIUS  project),  a  method  for  computation  of  initial  camera 
parameteis  from  given  imaging  conditions  is  introduced.  Next,  the  existing  site  model  is 
projected  onto  the  new  image  using  the  initial  camera  model.  Finally,  control  points  with 
known  3-D  coordinates  are  manually  aidjusted  to  the  correct  locations  in  the  new  image 
domain  and  the  RCDE  resection  operation  [11]  is  employed  to  refine  the  camera  parameters. 
Using  the  image-to-site-model  registration  algorithm,  we  have  successfully  built  a  site  model 
for  all  forty  images  in  the  model  board  2  data  set,  verified  the  given  control  points  for  model 
board  2,  and  refiined  the  camera  parameters  for  each  model  board  image. 

In  the  remainder  of  this  section,  first  we  briefly  summarize  the  conformal  transformation 
used  in  photogrammetry.  Next,  we  study  the  camera  representations  given  in  our  data  base. 
The  relationship  between  the  two  representations  is  pointed  out,  followed  by  an  algorithm 
for  image-to-site-model  registration.  Experimental  results  on  initial  camera  parameter  es¬ 
timation,  camera  parameter  refinement,  and  control  point  verification  are  presented  at  the 
end  of  the  section. 

3.2.1.  Conformal  Transformations 

In  conformal  transformations,  camera-centered  coordinates  are  represented  by  first  shifting 
the  world  coordinates  by  (x,,,  Vo,  Zo),  then  rotating  the  resulting  coordinates  around  the 
x-axis  by  u,  followed  by  a  rotation  by  <{>  around  the  resulting  y-aixis,  and  finally,  a  rotation 
by  K  around  the  resulting  z-axis.  A  positive  rotation  is  defined  as  a  clockwise  rotation  when 
viewed  from  the  origin  in  the  direction  of  the  positive  axis.  Assuming  the  coordinates  of 
a  point  in  the  world  coordinate  system  are  (xu»,yu;,  Zw)>  and  the  coordinates  of  the  point 
in  the  camera  centered  coordinate  system  are  {xc,yc,Zc),  the  transform  from  (x,i„yu„z,„)  to 
ixc,yc,Zc)  is  given  by 

(Xc  \  /  Xyj  \ 

Pe  I  —  R'*(^)  Ry(^)  R*(<*^)  I  yvi  yo  I 

Zc  /  \  Zy,-Zo  ) 
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3.2.2.  Camera  Specification  in  Model-Board  Data 

Although  it  is  simple  to  represent  a  camera  model  in  conformal  form,  the  three  rotation 
angles  (x>,  <l>  and  k  are  not  intuitive.  Commonly  available  camera  parameters  are  camera 
position  and  camera  viewing  direction  with  respect  to  the  world  coordinate  system.  Given 
the  camera  viewing  direction,  i.e.  off-nadir  angle  a  and  azimuth  angle  ^  measured  east  of 
north  (as  shown  in  Figure  3),  alignment  of  the  world  coordinates  to  the  camera-centered 
coordinates  can  be  achieved  through  the  following  four  operations: 

1.  Translate  the  world  coordinates  by  (x<,>  Vot  ^o)’, 
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Figure  3:  Camera  orientation  in  the  world  coordinates. 


2.  Rotate  around  the  resulting  z-axis  hy  $  —  |  to  align  the  y-axis  with  the 

camera  azimuth  direction  (so  that  the  camera  is  looUng  at  the  origin  from  the  resulting 
positive  y-axis),  where  is  the  angle  from  the  positive  z-aods  of  the  world  coordinates 
to  the  north  direction; 

3.  Rotate  aroimd  the  resulting  z-axis  by  —a,  where  a  is  the  given  camera  elevation  angle; 

4.  Rotate  around  the  resulting  z-axis  by  7  to  let  the  north  direction  be  JV,  measured  in  the 
image  domain;  7  =  jfVe  —  Ni  where  Nc  is  the  (predicted)  angle  for  the  north  direction 
after  Step  3. 

The  rotation  matrices  for  steps  2-4  are 

cos4  sin/9  0  ' 

R2  =  R,(/d)  =  —  sin/9  cos 4  0  (4) 

0  0  1. 

■  1  0  O' 

R3  =  Rx(Q!)  =  0  cos  a  —  sin  a  (5) 

0  sin  a  cos  a 

cos 7  sin 7  O' 

R4  =  R,(7)  =  — sin7  COS7  0  (6) 

[0  0  1. 

where 

7  =  Nc-Ni 


13 


Thus,  the  total  rotation  matrix  is 


R  =  R4R3R3 


COS  7  sin  7  0  cos$  am$  ^  0 

=  —sin  7  cos  7  0  —  cos  a  sin cos  at  cos  —sin  a 
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When  the  distance  from  the  camera  to  the  stare  point,  r,  is  given,  the  initial  camera  trans¬ 
lation  is  computed  as 
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3.2.3.  Relationship  Between  the  Two  Representations 

As  the  values  of  Xo,  yo,  Zo  and  R  are  independent  of  the  interpretation  of  how  the  camera 
is  aligned,  by  comparing  the  corresponding  terms  in  (2)  and  (7)  we  can  convert  the  camera 
parameters  from  one  representation  to  the  other.  Note  that  the  camera  is  always  above  the 
horizon,  so  that 
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The  relationship  between  the  camera  parameters  in  the  two  representations  are 
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3.2.4.  Camera  Roll  Estimation 


Given  the  viewing  direction  of  the  camera,  the  camera  can  still  rotate  around  its  optical 
axis,  leaving  one  degree  of  freedom  undetermined.  If  the  north  direction  is  known  in  world 
coordinates,  we  can  determine  the  orientation  of  the  north  vector  in  camera-centered  coor¬ 
dinates  which  are  free  of  camera  roll;  the  angle  between  the  predicted  north  direction,  and 
the  north  direction  in  the  image  plane  is  equal  to  the  camera  roll  angle.  Since  the  camera 
azimuth  angle  0  is  measured  east  of  north,  after  aligning  the  y-axis  with  the  camera  viewing 
direction,  the  angle  from  the  x-axis  to  the  north  direction  \s  ~  +  0  where  0  is  the  azimuth 
angle  of  the  (given)  camera  viewing  direction.  We  then  rotate  the  axes  around  the  resiilt- 
ing  x-axis  by  —a  to  align  the  x-axis  with  the  camera  viewing  direction,  the  angle  from  the 
resulting  x-axis  to  the  north  direction  projected  onto  the  x-y  plane  is 
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=  arctan 
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In  our  work,  we  estimate  the  north  direction  in  an  image  plane  by  hand  picking  two 
points  along  “access  road  1”,  (Xi,li)  and  {X2,Y2)y  and  computing 
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The  camera  roll  angle  is  then  computed  as 


7  =  jVc  —  iVj  =  arctan 
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3.2.5.  Stare  Point  Estimation 

With  the  camera  rotation  angles  determined,  we  still  need  the  coordinates  of  camera  center 
(in  the  world  coordinate  system)  to  register  the  camera  coordinates  to  the  world  coordi¬ 
nates.  The  camera  position  is  usually  determined  by  giving  the  stare  point  and  the  distance 
between  the  stare  point  and  the  camera  center.  The  stare  point  information  is  important 
for  automatical  camera  model  refinement.  For  model  board  2  data  the  stare  points  are  not 
aveulable.  We  estimate  the  stare  point  by  the  difference  between  the  coordinates  of  a  known 
3-D  point  in  the  approximated  camera  model  and  its  correct  coordinates.  First  fissuming 
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the  stare  points  (xp,  yp,  Zp)  is  available,  we  have  the  transform  from  the  world  coordinates 
(xu,,  z,u)  to  the  camera  centered  coordinates  (x^,  ye,  ^e)  ^ 


/  \  /  ~  ~  Xp  \ 

I  J/c  .  I  ~  I  Vor  ~  Vp  I  (23) 

\  /  \  ^or  —  •2’p  / 


where  (xor,  J/or,  Zor)  are  determined  by  (11-13).  Next  consider  the  case  that  the  stare  point 
is  either  not  available  or  not  correct,  assximing  it  be  (xp,  yp,  ip).  The  camera  transformation 
becomes 

(Xc  \  /  Xu>  Xor  Xp  \ 

Vc  =  ^  I  Vw-yor-yp  (24) 

Zc  /  \  Zor  Zp  j 

Now,  pick  a  point  (x^u,  y,^,  zj)  whose  coordinates  under  transform  (24)  equal  to  (xc,  ye,  Ze) 
under  (23)  as 

(Xc  \  /  Xy)  Xor  —  \ 

Vc  I  =  y^-Vor-Vp  I  (25) 

Zc  )  \  Zy,  Zor  Zp  j 

By  taking  the  difference  between  (23)  and  (25)  we  get 


(Xyj  Xp  Xyj  "b  Xp  \  ^  ^  \ 

Vw-Vp-yw  +  yp  =  0  I 

Zy,  —  Zp  —  Zy,  +  Zp  J  \  0  / 


(26) 


or 


(a^P  -  \  /  x,„  -  x,„  \ 

Vp-Vp  1  =  1  yrv-yw  I 

Zp  Zp  f  y  Zy,  Zy,  J 


(27) 


If  we  select  {xy„yy„z,^  =  (0,0,0)  then  the  adjustment  for  the  initial  stare  point  estimation 
is  (x,„,  yy,,  Zy,).  The  procedure  for  adjusting  the  stare  point  estimation  is  as  follows 


1.  Project  the  site  model  to  the  new  image  domain  using  the  given  approximated  camera 
model. 


2.  Determine  the  coordinates  of  the  origin  imder  the  approximated  camera  model  (x,,,,  yy,,  Zy,). 

3.  The  adjustment  for  the  stare  point  estimation  is  (— x,„,  —yy,,  —Zy,). 


3.2.6.  Algorithm 

We  use  the  following  procedure  to  register  a  new  image  to  an  existing  site  model: 

1.  Manually  select  two  points  2ilong  the  north  direction  and  compute  Ni  using  (21). 

2.  Compute  the  camera  roll  angle  7  using  (22). 

3.  Compute  the  conformal  camera  parzuneters  using  (8-16).  Set  (xp,  yp,  Zp)  to  zero  if  they 
are  unavailable. 
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4.  Project  the  site  model  to  the  new  image  using  the  parameters  obtained  from  step  3. 

5.  Adjust  the  estimation  of  the  stare  point  if  the  offset  in  the  initial  projection  is  too 
large. 

6.  Manually  adjust  (at  least)  four  control  points  to  the  correct  locations  in  the  image 
domain  and  do  camera  resection  to  get  more  accurate  conformal  camera  parameters. 

7.  Compute  the  camera  parameters  in  the  world  coordinate  system  using  (18-19)  and 

(3  =  (28) 

r  =  (29) 

cos  a 

Xp  =  Xo  —  rsinacos(A/’^  — /3)  (30) 

Vp  =  yo-Tsinasin{N^- 0)  (31) 

where  we  have  assumed  Zp  =  0. 

3.2.7.  Experiments 
Camera  Roll  Estimation 

For  each  of  the  forty  model-board-2  images,  we  manually  select  two  points  along  the  north 
direction  and  compute  AT,  and  7  using  (21)  and  (22).  The  results  are  listed  in  Table  1. 

Camera  Paraiueter  Calibration 

Given  the  camera  viewing  direction  (a,/9),  the  camera  range  r,  and  the  camera  roll  angle 
estimated,  we  use  (11-13)  and  (14-16)  to  compute  the  camera  parameters  for  the  conformal 
representation.  The  results  are  listed  in  Table  1. 

We  then  apply  camera  resection,  based  on  these  initial  camera  pzu’ameters  (assume  Xp  = 
yp=z  Zp  =  0)  and  correspondences  of  five  control  points,  to  obtain  refined  camera  parameters 
Xoj  Vo,  Zg,  u),  4  and  k.  We  further  compute  the  corresponding  refined  camera  parameters  in 
world  coordinates,  a,  7,  r,  Xp  and  yp,  using  (18-19)  and  (28-31).  Table  2  lists  the  refined 
camera  parameters  for  all  the  model-boajd-2  images.  Table  3  lists  the  differences  between 
the  given  zmd  refined  camera  parameters. 

Figure  4  shows  an  example  of  registering  a  new  image,  M38,  shown  in  (a),  to  the  existing 
site  model.  The  estimated  camera  roll  angle  for  M38  is  7  =  90.1°.  The  given  camera 
elevation  and  azimuth  angles  are  a  =  40°  md  =  90°.  The  approximate  range  is  r  =  10850 
feet.  From  these  we  compute  the  initial  czunera  parjuneters  as  Xo  =  6974  feet,  j/o  =  0  feet, 
Zg  =  8311  feet,  u)  =  0.0°,  ^  =  40.0°,  and  k  =  0.1°.  In  computing  Xg,  yg  and  Zg,  we  set  the 
unknown  parameters  Xp,  yp  and  Zp  to  zero.  Figure  4(b)  shows  the  projection  of  the  existing 
site  model  into  the  new  image  using  the  above  approximate  camera  paurameters.  In  many 
applications  an  approximate  stare  point  is  available.  Figure  4(c)  shows  the  projection  of  the 
existing  site  model  into  the  new  image  domain  when  an  approximate  stare  point  is  available. 
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Table  1:  Initial  Camera  Parameters 


image 

NU‘) 

an 

r(ft) 

in 

Xor  (ft) 

Vor  (ft) 

^or  (ft) 

a;(») 

«»(°) 

<(°) 

Ml 

226.2 

30 

340 

■  IKMIl 

-1855 

9396 

-28.5 

-9.8 

-141.5 

M2 

28.7 

45 

54 

1 

6206 

7672 

-30.4 

34.9 

79.9 

M3 

1.0 

0 

34 

1  SI 

123.0 

0 

0 

10850 

0.0 

0.0 

89.0 

M4 

357.9 

30 

75 

10850 

-191.0 

5240 

1404 

9396 

-8.5 

28.9 

96.2 

M5 

357.3 

30 

255 

10850 

-10.4 

-5240 

-1404 

9396 

8.5 

-28.9 

96.8 

M6 

88.6 

15 

86 

10850 

87.5 

2801 

195 

10480 

-1.1 

15.0 

1.6 

M7 

9r'.2 

15 

333 

10850 

-30.0 

-1274 

2502 

10480 

-13.4 

-6.7 

-3.8 

M8 

285.0 

45 

352 

10850 

-206.2 

-1067 

7597 

7672 

-44.7 

-5.6 

159.4 

M9 

269.5 

35 

180 

10850 

0.5 

0 

-6223 

8887 

35.0 

0.0 

-179.5 

MIO 

185.4 

30 

223 

10850 

-228.3 

-3699 

-3967 

9396 

22.9 

-19.9 

-87.2 

Mil 

206.5 

0 

100 

10850 

-16.5 

0 

0 

10850 

0.0 

0.0 

-116.5 

M12 

356.8 

45 

317 

10850 

-319.6 

-5232 

5611 

7672 

-36.2 

-28.8 

73.8 

M13 

0.1 

30 

51 

10850 

144.9 

4216 

3414 

9396 

-20.0 

22.9 

98.0 

M14 

352.5 

40 

356 

10850 

-267.7 

-486 

6957 

8311 

-39.9 

-2.6 

95.4 

M15 

95.3 

30 

210 

10850 

-151.6 

-2712 

-4698 

9396 

26.6 

-14.5 

1.8 

M16 

287.2 

45 

29 

10850 

-159.1 

3719 

6710 

7672 

-41.2 

20.0 

179.5 

M17 

183.3 

25 

100 

10850 

5.8 

4515 

-796 

9833 

4.6 

24.6 

-95.2 

M18 

354.3 

30 

10 

10850 

-252.8 

942 

5342 

9396 

-29.6 

5.0 

98.5 

M19 

222.5 

30 

40 

10850 

-88.4 

3487 

4155 

9396 

-23.9 

18.7 

-124.4 

M20 

227.8 

25 

186 

10850 

-311.2 

-479 

-4560 

9833 

24.9 

-2.5 

-136.7 

M21 

358.7 

30 

75 

10850 

-191.7 

5240 

1404 

9396 

-8.5 

28.9 

95.4 

M22 

0.3 

30 

255 

10850 

-13.4 

-5240 

-1404 

9396 

8.5 

-28.9 

93.8 

M23 

261.4 

45 

317 

10850 

-224.2 

-5232 

5611 

7672 

-36.2 

-28.8 

169.2 

M24 

176.4 

25 

80 

10850 

-5.4 

4515 

796 

9833 

-4.6 

24.6 

-84.4 

M25 

174.5 

30 

146 

10850 

-302.4 

3033 

-4497 

9396 

25.6 

16.2 

-92.1 

M26 

84.1 

45 

9 

10850 

18.5 

1200 

7577 

7672 

-44.6 

6.4 

12.1 

M27 

349.2 

45 

80 

10850 

-176.3 

7555 

1332 

7672 

-9.9 

44.1 

107.7 

M28 

113.8 

15 

15 

10850 

-8.3 

726 

2712 

10480 

-14.5 

3.8 

-22.8 

M29 

270.3 

35 

278 

10850 

-263.7 

-6162 

866 

8887 

-5.6 

-34.6 

176.6 

M30 

273.5 

25 

186 

10850 

3.1 

-479 

-4560 

9833 

24.9 

-2.5 

177.7 

M31 

351.1 

30 

350 

10850 

-272.6 

-942 

5342 

9396 

-29.6 

-5.0 

96.1 

M32 

79.5 

45 

115 

10850 

-241.2 

6953 

-3242 

7672 

22.9 

39.9 

-4.6 

M33 

186.0 

30 

40 

10850 

-51.9 

3487 

4155 

9396 

-23.9 

18.7 

-87.9 

M34 

359.1 

15 

55 

10850 

-213.2 

2300 

1610 

10480 

-8.7 

12.2 

92.7 

M35 

88.5 

45 

300 

10850 

-66.3 

-6644 

3836 

7672 

-26.6 

-37.8 

-15.6 

M36 

278.1 

30 

70 

10850 

-115.6 

5097 

1855 

9396 

-11.2 

28.0 

177.2 

M37 

12.4 

30 

115 

10850 

-170.4 

4916 

-2292 

9396 

13.7 

26.9 

71.3 

M38 

89.9 

40 

90 

10850 

-269.9 

6974 

0 

8311 

0.0 

40.0 

0.1 

M39 

269.4 

30 

165 

10850 

-16.6 

1404 

-5240 

9396 

29.1 

7.4 

176.5 

M40 

7.6 

40 

220 

10850 

-50.0 

-4482 

-5342 

8311 

32.7 

-24.4 

97.3 
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Table  2:  Camera  Parameters  after  Camera  Resection 


image 

0n 

r  (ft) 

7n 

Xo  (ft) 

Vo  (ft) 

^0  (ft) 

4>n 

«n 

Ml 

30.7 

-20.8 

10321 

-162.4 

-681 

5751 

8871 

-29.1 

M2 

45.8 

51.5 

10586 

123.7 

6715 

5350 

7379 

-32.6 

M3 

1.7 

-103.0 

10361 

-15.5 

788 

177 

10356 

0.4 

87.4 

M4 

31.3 

85.6 

10208 

176.8 

6707 

608 

8719 

-2.7 

31.2 

91.9 

M5 

29.9 

-108.1 

10536 

-15.1 

-3574 

-1395 

9133 

10.1 

-28.3 

95.5 

M6 

20.1 

94.6 

10439 

95.6 

4219 

194 

9802 

1.7 

20.1 

0.7 

M7 

15.9 

3.6 

10423 

2.9 

762 

3448 

10022 

-15.9 

1.0 

-0.6 

M8 

44.8 

-6.7 

10384 

150.1 

74 

7977 

7368 

-44.6 

-4.7 

154.8 

M9 

35.1 

-177.7 

10156 

2.3 

793 

-5614 

8306 

35.1 

-1.3 

-179.5 

MIO 

29.2 

-144.1 

10773 

123.8 

-2179 

-3815 

9404 

24.3 

-16.6 

-88.6 

Mil 

6.5 

125.7 

10764 

7.7 

1969 

143 

10694 

3.8 

-118.2 

M12 

45.7 

-47.2 

10636 

32.8 

-4762 

5833 

7431 

-34.8 

69.9 

M13 

29.2 

78.8 

10791 

168.0 

6097 

1496 

9417 

-6.2 

90.8 

M14 

46.4 

-3.2 

10486 

88.2 

744 

8505 

7233 

-46.3 

-2.3 

90.4 

M15 

30.7 

-155.8 

10541 

-158.2 

-1277 

-4577 

9067 

28.4 

-12.1 

0.6 

M16 

47.0 

36.3 

10706 

-154.1 

5772 

7219 

7297 

25.7 

179.2 

M17 

24.1 

121.1 

11061 

24.2 

4782 

-1702 

10095 

13.0 

20.5 

-99.3 

M18 

29.0 

-27.3 

10873 

61.6 

-1396 

5551 

9511 

-26.2 

-12.8 

85.9 

M19 

33.0 

33.5 

10285 

-97.1 

4237 

5309 

8623 

-28.5 

17.5 

-126.1 

M20 

26.0 

-173.0 

10307 

48.6 

484 

-4068 

9265 

25.8 

-3.1 

-137.8 

M21 

31.7 

92.0 

10313 

-178.1 

6641 

727 

8774 

1.3 

31.7 

89.5 

M22 

29.9 

-107.2 

10646 

-16.3 

-3561 

-694 

9225 

9.7 

-28.5 

93.4 

M23 

45.0 

-50.7 

10359 

125.4 

-4844 

5352 

7327 

-32.3 

-33.2 

166.2 

M24 

24.6 

90.2 

10396 

2.2 

5305 

209 

9454 

0.1 

24.6 

-88.0 

M25 

29.0 

165.5 

10322 

78.3 

2125 

-4583 

9027 

28.2 

7.0 

-88.9 

M26 

45.8 

9.2 

10705 

14.2 

2385 

8350 

7467 

-45.4 

6.6 

7.8 

M27 

45.1 

81.3 

10670 

-176.9 

8585 

1762 

7529 

-8.7 

44.5 

105.4 

M28 

15.8 

25.1 

10856 

0.0 

1959 

3154 

10445 

-14.4 

6.6 

-24.3 

M29 

34.9 

-90.8 

10768 

88.1 

-5293 

620 

8828 

0.6 

-34.9 

179.1 

M30 

24.4 

-164.6 

10630 

12.7 

-266 

-3592 

9678 

23.7 

-6.3 

178.6 

M31 

34.3 

-11.4 

10583 

80.2 

-382 

6755 

8742 

-33.8 

-6.4 

89.7 

M32 

45.0 

129.2 

10747 

129.4 

7079 

-4306 

7592 

32.4 

33.2 

-9.8 

M33 

33.6 

36.9 

10818 

-57.1 

4719 

5597 

9011 

-28.0 

19.4 

-89.1 

M34 

15.3 

69.4 

10985 

158.6 

3830 

1539 

10596 

-5.5 

14.3 

89.9 

M35 

45.6 

-69.5 

10962 

-75.9 

-6414 

3280 

7664 

-19.7 

-42.1 

-14.0 

M36 

31.8 

58.5 

10756 

-127.6 

6077 

3260 

9146 

-17.9 

26.7 

178.1 

M37 

31.4 

133.7 

10347 

-154.5 

5106 

-3041 

8835 

22.8 

22.1 

67.3 

M38 

42.6 

93.3 

10563 

91.9 

7977 

309 

7776 

3.0 

42.5 

-2.6 

M39 

33.0 

159.5 

10649 

-24.3 

3062 

-4889 

8935 

31.3 

11.0 

173.1 

M40 

44.7 

-105.7 

10805 

-19.9 

-6508 

-1440 

7679 

15.0 

-42.6 

91.6 

Table  3:  Corrections  for  Camera  Parameters  after  Camera  Resection 


image  I  6a. 


Ml  0.7 
M2  0.8 


M4  1.3 
M5  -0.1 
M6  5.1 
M7  0.9 
M8  -0.2 
M9  0.1 
MIO  -0.8 
Mil  6.5 
M12 
M13 
M14 
M15  0.7 

M16  2.0 

M17  -0.9 

M18  -1.0 

M19  3.0 

M20  1.0 

M21  1.7 

M22  -0.1 

M23  0.0 

M24  -0.4 

M25  -1.0 

M26  0.8 

M27  0.1 

M28  0.8 

M29  -0.1 

M30  -0.6 

M31  4.3 

M32  0.0 

M33  3.6 

M34  0.3 

M35  0.6 

M36  1.8 

M37  1.4 

M38  2.6 

M39  3.0 

M40  4.7 


6$  Sr 


-0.8  -529 
-2.5  -264 
■137.0  -489 

10.6  -642 
-3.1  -314 
8.6  -411 

30.6  -427 

1.3  -466 

2.3  -694 
-77 
-86 

-214 
27.8  -59 

0.8  -364 
-5.8  -309 
-144 
21.1  211 
-37.3  23 

-565 
1.0  -543 
17.0  -537 
-204 
-491 
-454 
-528 
-145 
-180 
6 

-82 
-220 
-1.4  -267 

14.2  -103 

-3.1  -32 

14.4  135 

-9.5  112 

-11.5  -94 

18.7  -503 

3.3  -287 
-5.5  -201 

34.3  -45 


-138.5 

7.8 


1 


-164 

216 

-69 

-996 

-225 

-484 

353 

-331 

384 

-289 

-715 

-439 

-2389 

624 

-205 

-398 

-1542 

-659 

518 

80 

-1597 

-167 

-975 

-811 

-348 

-5 

-186 

-34 

-951 

321 

504 

-1569 

633 

-590 

-1095 

1100 

-1424 

-409 

-187 

3287 


3.4 

-17.8 

-12.6 

-4.6 

-1.2 

-1.7 

0.9 

-0.6 

-1.1 

9.8 

2.8 

-5.9 

1.2 

0.4 

o 

1 

3.9 

-4.4 

-3.0 

0.0 

-9.2 

-3.6 

3.2 

-0.8 

0.2 

-4.3 

1.2 

0.4 

-2.3 

0.1 

2.8 

-1.5 

6.2 

-0.3 

2.5 

-1.2 

-3.8 

0.9 

-4.2 

-1.4 

-6.4 

9.5 

-6.7 

-5.2 

-4.1 

0.7 

-1.2 

3.2 

2.1 

-2.8 

6.9 

-4.3 

1.6 

-6.7 

-1.3 

0.9 

9.1 

-4.8 

-4.0 

3.0 

2.5 

-2.7 

2.2 

3.6 

-3.4 

-17.7 

-18.2 

-5.7 

-4.3  1414  200 
-1.3  1419  234 
-0.9  638  483 

3.2  581  593 

-4.6  924  710 

0.0  1024  224 
-1.4  904  440 

-1.7  974  859 

819  661 
929  471 
-5.0  1164  923 
-1.2  930  325 

-0.3  1131  907 
-4.1  913  635 


1225  920 
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(b)  Initial  projection  of  the  site  model  into  the  new  image 


Figure  4:  Registering  a  new  image  to  the  site  model. 
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c)  Initial  projection  of  the  site  model  when  the  stare  point  is  available 


(d)  Site  model  projection  after  camera  resection 
Figure  4:  (cont.)  Registering  a  new  image  to  the  site  model. 
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For  each  of  the  cases  shown  in  (b)  and  (c),  we  then  manually  select  five  control  points, 
adjust  them  to  the  correct  positions  in  the  new  image  domain,  and  do  camera  resection  to 
get  more  accurate  camera  parameters.  Figure  4(d)  shows  the  projection  of  the  site  model 
into  the  new  image  after  camera  resection.  The  refined  camera  parameters  are:  Xo  =  7977 
feet,  yo  =  309  feet,  Zo  =  7776  feet,  u;  =  3.0®,  4  =  42.5®,  k  =  —2.6®.  Accordingly,  we  have 
a  =  42.6®,  ^  =  93.3®,  7  =  91.9®,  Xp=839,  yp=719,  and  r=10563. 

The  following  observations  about  the  given  camera  parameters  can  be  made  from  Table  3: 

1.  The  initial  camera  elevation  angles  are  relatively  accurate,  with  a  maximum  error  of 
6.4®. 

2.  The  errors  in  the  camera  azimuth  angles  are  relatively  large,  but  except  for  image  M3, 
which  is  viewed  from  the  nadir  direction  (o;  =  1.7®),  the  errors  in  camera  azimuth  angle 
are  within  ±38®.  The  camera  azimuth  error  for  M3  is  —137.0®. 

3.  It  worild  be  useful  to  know  the  camera  stare  point. 

4.  Overall,  the  initial  camera  parameters  obtsdned  by  (11-13)  and  (14-16)  give  a  good 
initial  set  of  parameters  for  camera  resection. 


Control  Point  Verification 


Using  the  refined  camera  parameters  obtained  above,  we  further  evaluated  the  control  points 
provided  with  the  model  board  2  data.  It  was  found  that  except  for  point  #265,  the  east 
intersection  of  the  curved  and  straight  tracks,  and  some  typographical  errors,  the  3-D  coordi¬ 
nates  of  the  control  points  are  quite  accurate.  The  following  errors  were  fotmd  and  corrected 
in  our  experiments. 

1.  The  minus  signs  of  the  y  coordinates  of  points  #20,  #22,  and  #220  are  missing. 

2.  Point  #38  is  not  marked.  It  should  be  added  next  to  point  #17. 

3.  Point  #78  is  not  given;  its  mark,  on  building  B9,  should  be  removed. 

4.  The  marks  for  points  #231,  #234  and  #238  are  not  correct;  the  control  points  are 
located  along  the  jtmction  of  the  wall  and  the  roof. 

5.  Point  #237  is  not  correct;  it  should  be  on  the  groimd  just  under  the  apex  of  the  dormer 
window. 


Based  on  our  camera  resection  results,  we  found  that  the  correct  y  value  for  point  #265 
should  be  about  2.9626  inch.^  We  also  computed  the  correspondences  (image  domain  indices) 
of  the  control  points  on  each  image: 


/  rii(x  -  Xq)  +  ri2{y  -  yo)  +  ri3(g  -  Zp) 

e  r3i(x  -  Xo)  +  r33(y  -  y^)  +  r33(z  -  Zo) 

f  r2i(x  -  Xq)  +  r22(y  -  yo)  +  Vj^jz  -  Zp) 

e  r3i(x  -  Xo)  +  r32(y  -  yo)  +  r33(z  -  Zo) 


(32) 


^The  i^value  for  point  #265  was  not  acfiusted  in  computing  the  correspondences  listed  in  Table  4. 
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(33) 


where  /  is  the  camera  focal  length,  e  is  the  pixel  spacing  in  image  plane,  Zo,  yo  and  Zg  are 
the  camera  shift  parameters  obtained  from  camera  resection,  and  rn, . . . ,  rss  are  computed 
using  either  (2)  or  (7).  The  results  are  listed  in  Table  4.  Points  which  are  out  of  the  frame 
board  are  printed  in  Table  4,  as  dashes,  and  points  which  are  invisible  (blocked  by  other 
buildings)  are  bracketed. 

3.3.  Region  Delineation 

Given  an  image  to  be  exploited,  quickly  locating  the  regions  of  interest  according  to  the 
tasks  in  the  QL  profile,  and  hence,  narrowing  the  search  area  and  reducing  computation 
and  false  alarms  is  an  important  step  in  site  model  supported  image  monitoring.  We  have 
developed  two  algorithms  for  quickly  delineating  regions  of  interest.  When  3-D  features  for 
the  regions  of  interest  are  available  from  the  site  model,  we  can  quickly  compute  the  “valid” 
portion(s)  of  the  regions  in  the  current  image  domain,  and  fill  the  regions  of  interest  with 
appropriate  labels.  For  regions  such  as  roads  and  parking  lots,  we  further  compute  their 
directions,  which  are  useful  for  vehicle  detection.  When  the  region  of  interest  is  available 
from  another  image,  we  use  the  image-to-ims^e  transform  equations  (60-61)  in  Section  3.7. 
to  quickly  transform  the  corresponding  region  to  the  new  image  domain.  Two  examples  are 
presented  to  illustrate  the  region  delineation  step. 

In  Figure  5  the  delineation  of  regions  of  interest  (roads)  from  corresponding  features 
stored  in  the  site  model  is  shown:  (a)  an  image  to  be  monitored;  (b)  regions  corresponding 
to  roads  in  the  site  model;  (c)  direction  maps  for  the  road  region;  and  (d)  shows  the  image 
of  the  region  of  interest. 

In  Figure  6  the  delineation  of  the  region  of  interest  (a  storing  garage)  from  a  region  map 
associated  with  an  earlier  image  is  shown:  (a)  and  (b)  the  earlier  image  and  its  region  map; 
(c)  the  region  of  the  storing  garage  in  the  old  image;  (d)  the  new  image  to  be  exploited;  (e) 
the  region  of  the  storing  garage  in  the  new  ims^e  delineated  using  our  algorithm;  and  (f) 
shows  the  image  of  the  region  of  interest. 

3.4.  Integration  into  RCDE 

In  the  past  year  we  have  spent  a  considerable  2unoimt  of  effort  on  mastering  the  RCDE.  As 
of  the  time  of  preparation  of  this  report,  we  have  made  following  progress  on  integration  into 
RCDE: 

1.  We  have  successfully  installed  the  latest  version  of  RCDE  on  all  our  SPARC- 10  work 
stations,  including  the  latest  upgrades  from  SRI. 

2.  Using  our  recently  developed  image-to-site-model  registration  algorithm,  we  have  built 
a  site  model  for  model  board  2  which  includes  all  forty  images. 

3.  We  have  added  new  lU  functions  into  RCDE  which  can  be  invoked  from  augmented 
menus.  Figure  7  shows  several  new  functions  that  we  have  added  into  RCDE: 

•  SNF  filter. 
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Table  4:  Correspondences  from  given  world  coordinates 
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Table  4:  Correspondences  from  given  world  coordinates 
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Table  4:  Correspondences  from  given  world  coordinates  (cont.) 
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Table  4:  Correspondences  from  given  world  coorrlinates 
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M37  -  -  -  -  880  688  (738  536)  756  590  743  526  751  601  (778  286)  784  275  888  294  955  307  1004  316  1119  337 

M38  -  -  -  -  1065  393  (920  472)  960  484  906  471  975  485  725  336  710  335  743  235  766  170  782  122  821  10 

M39  341  956  338  947  413  699  (542  637)  498  643  545  645  495  635  (707  699)  711  707  681  765  662  801  648  828  615  891 

M40  978  747  (974  738)  754  735  (719  656)  713  693  722  664  (710  c-,  .)  801  542  805  551  855  562  868  568  911  574  966  586 


Table  4:  Correspondences  from  given  world  coordinates  (cont.) 
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Table  4;  Correspondences  from  given  world  coordinates 
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:  Correspondences  from  given  world  coordinates 
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Table  4:  Correspoiulonces  from  given  world  coordinates  (cont.) 
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(b)  Road  regions  delineated  using  site  model 


Figure  5:  Region  delineation  using  a  site  model 
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(d)  Image  to  be  monitored 


Figure  5:  (cont.)  Region  delineation  using  a  site  model 
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(b)  Region  map  associated  with  the  earlier  image 


Figure  6:  Image  delineation  using  an  associated  map 
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Figure  6:  (cont.)  Image  delineation  using  an  associated  map 
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(e)  Region  of  interest  in  the  new  image 


(f)  Image  of  the  region  of  interest 


Figure  6:  (cont.)  Image  delineation  using  an  associated  map 
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Figure  7:  Some  functions  added  to  HCDF 


•  Load  raw  formatted  image, 

•  Save  image  in  various  formats, 

•  UM-Caimy  edge  detector, 

•  Vehicle  detection 

4.  We  have  detected  and  corrected  several  minor  bugs  in  the  RCDE  source  code.  We 
have  been  in  contact  with  the  RCDE  developers  at  Martin  Marietta,  King  of  Prussia, 
PA. 

3.5.  Monitoring  Construction  Activities 

Two  site  model  supported  monitoring  tasks  have  been  considered  in  our  system.  The  first 
is  monitoring  new  construction  activities  using  cylindrical  structure  as  an  example.  The 
second  is  detecting  and  counting  vehicles  in  a  garage  area,  on  roads,  and  in  a  training 
ground.  In  this  section  we  discuss  the  subsystem  for  monitoring  construction  activities, 
details  of  low  level  features  used,  the  representation  of  target  objects  in  terms  of  the  low 
level  features,  feature  extraction  scheme,  hypothesis  generation,  and  hypothesis  verification. 
An  experimental  result  on  monitoring  of  new  cylindrical  structure  from  model  board  images 
is  presented. 

3.5.1.  Low  Level  Feature  Extraction: 

Edge  detection:  A  Canny  edge  detector  [3]  is  first  used  to  get  an  edge  map  and  a  gradient 
direction  for  each  edge  pixel.  We  have  found  that  the  edge  map  is  more  reliable  than  region 
segmentation  output,  especially  when  they  are  used  to  search  for  objects  in  a  cluttered  image. 

Line  linking:  We  apply  a  line  linking  program  [15]  to  the  edge  detector  output.  In  doing 
this,  we  first  scan  the  edge  map  and  group  the  edge  pixels  according  to  some  predefined 
templates.  We  then  merge  small  collinear  line  fragments  into  long  straight  lines. 

3.5.2.  Object  Representation  for  Cylinders 

We  use  a  hierarchically  parameterized  object  model  to  incorporate  knowledge  from  the  site 
model  and  the  image  acquisition  conditions  into  the  low  level  processes.  For  exaunple,  a  3-D 
model  is  designed  to  accommodate  prior  information  about  a  3-D  cylinder  on  the  groimd 
plane  as  follows: 

(a)  height  of  the  cylinder,  hzd  €  (Amin,  Anmx) 

(b)  radius  of  its  cross-section,  ^  (r„i„.  r„.„) 

(c)  center  of  its  base  (xw,  yu»,0) 

(d)  center  of  its  apex  (Xn„y,„,  h^d.) 
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In  the  above  object  model  definition,  the  constraints  on  a  3-D  cylindrical  object  become 
part  of  the  object  model  and  independent  of  camera  pose.  is  the  range  of  the 

height  of  a  candidate  cylinder,  and  (r^».Tr^..)  is  the  range  of  the  radius  of  its  cross-section. 
We  further  model  the  contour  of  a  cylinder  as  an  ellipse,  a  pair  of  parallel  lines,  amd  some 
geometric  relations  between  them.  In  doing  this,  we  transfer  the  3-D  object  model  onto 
the  following  2-D  object  model,  which  depends  on  the  camera  parameters,  and  use  it  as  a 
working  template  for  detecting  cylinders. 

1.  Ellipse: 

(a)  center  c  =  (xq,  yo)  €  A,  where  A  is  the  area  of  projection  of  the  set  of  3-D  points 
of  the  forms  (xu,,  yu»  ^3<<)  on  the  image  pl2me. 

(b)  length  of  the  semi-major  axis,  a  =  r^ti  x  Sg. 

(c)  length  of  the  semi-minor  axis,  6  =  x  Se  x  cos  a  . 

(d)  orientation  =  7. 

2.  Pairs  of  parallel  lines: 

(a)  S3rmmetry  axis,  VL; 

(b)  length,  x  x  sin  a. 

(c)  separation,  x  Sg. 

(d)  orientation,  f  -I-  7. 

3.  Geometric  constraint(s). 

The  center  of  the  ellipse  should  be  close  to  the  symmetry  axis  of  the  parallel  lines. 

Sg  is  a  scale  factor  derived  from  the  camera  focal  length  and  image  resolution. 

3.5.3.  Primitive  Feature  Detection 

Primitive  features  are  building  blocks  used  to  describe  the  objects.  They  are  useful  for 
locating  possible  objects.  A  robust  primitive  feature  extractor  is  crucial  for  successful  tar¬ 
get  detection.  The  following  three  primitive  feature  extractors  have  been  implemented  for 
cylindrical  object  detection. 

Circle  detection:  A  circle  is  of  the  form 

(X  -  xoY  +  {Y~  yof  = 

where  (xo,yo)  is  the  center  of  the  circle  and  r  is  its  radius.  A  traditional  approach  to 
circle  direction  is  the  generalized  Hough  transform  [2,  12],  which  requires  a  huge  amount  of 
memory.  We  have  defined  a  two-stage  template  matching  scheme  for  circle  detection.  In  the 
first  stage,  edge  templates  are  used  to  determine  possible  candidate  centers.  In  the  second 
stage,  gradient  direction  templates  axe  used  to  re-inspect  the  selected  candidate  center  points. 
The  details  are  as  follows: 
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Edge  template  matching:  For  each  r,  we  form  a  search  space,  QA,  by  quantizing  the  an¬ 
gular  range  [0®,  360®]  into  10  x  r  levels.  The  radius  vector,  in  the  Cartesian  coordinate 
system  is 

(3^) 

it  is  used  as  the  edge  template  for  a  circle.  We  then  apply  this  template  to  the  edge  map  and 
obtain  candidate  centers  which  have  sufficient  numbers  of  supporting  pixels  aroimd  them. 

Gradient  direction  template  matching:  The  gradient  direction  of  a  boundary  pixel  is 
the  direction  from  the  center  to  the  pixel.  We  apply  both  the  edge  and  gradient  direction 
templates  to  each  candidate  circle,  allowing  a  three  pixel  wide  tolerance  band  on  the  edge 
template  to  accommodate  slightly  misplaced  pixels.  For  an  edge  pixel  to  be  a  supporting 
pixel,  the  pixel  must  fall  within  the  tolerance  band  and  have  a  gradient  direction  consistent 
with  the  gradient  template.  We  accept  those  candidates  whose  consistency  scores  are  above 
the  high  threshold  for  a  circle.  For  candidates  whose  consistency  scores  faJl  between  the 
high  and  low  thresholds,  we  further  apply  a  radius  histogram  test:  if  we  plot  a  histogram  of 
intensity  as  a  function  of  distance  from  the  candidate  center,  there  should  be  a  steep  slope 
around  the  radius  of  the  circle.  The  center  and  the  radius  of  the  successful  candidate  are 
then  stored  as  CtVjt. 


Ellipse  detection:  An  ellipse  is  of  the  form 

(Y-yor 

a‘  V 


(35) 


The  scheme  we  use  for  detection  of  an  ellipse  is  similzur  to  the  scheme  used  for  circle  detection. 
The  difference  lies  in  the  way  we  generate  edge  templates  and  gradient  direction  templates. 
To  simplify  the  discussion,  assuming  that  the  major  axis  of  the  ellipse  is  parallel  to  the  x-axis 
(7  =  0),  we  define  the  following  templates: 


Edge  templates:  The  edge  pixels  for  an  ellipse  satisfy 

Note  that  the  definition  of  in  (36),  shown  in  Figure  8,  is  different  from  the  definition  of 
in  (34). 


Gradient  direction  template:  As  shown  in  Figure  8,  for  ellipse,  V{x}))  corresponds 
to  point  n  (instead  of  m).  The  gradient  orientation  is  determined  by 


Ay  a 

tan^  =  —  =  T  X  tanV>. 
Ax  6 

We  define  the  gradient  direction  template  as 


(37) 


G(V’)  =  arctan 


°  . 

-  X  tan 

kO 


rj}  €  QA. 


(38) 
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Figure  8:  Ellipse 


When  the  camera  roll  angle  is  non-zero  (7  ^  0),  (36)  and  (38)  become 

G{ti>)  =  arctan  x  tanV’^  4-7  0  €  QA.  (40) 

Line  grouping:  For  line  grouping,  we  use  the  constraints  from  the  camera  model  to  check 
candidate  parallel  lines.  Since  the  silhouette  of  a  cylinder  is  always  projected  along  the 
camera  viewing  direction,  we  ignore  lines  which  are  oriented  far  away  from  the  expected 
direction.  As  shown  in  Figure  9,  two  lines,  Li  and  Lj,  form  a  parallel  line  pair,  Paraij,  if 
they  satisfy  the  following  constraints: 


Figure  9:  Line  grouping 


1.  parallelism:  \0i  —  9 i\  <  eg 


2.  distance:  dist{Li,Lj)  €  (r, 


mm) '  TOMJt/ 


where 


dist{Li,Lj)  =  -\pLdist{Mi,Lj)  +  pl-dist{Mj,Li)] 


Oi  is  the  orientation  of  the  hne,  te  is  the  angle  deviation  threshold,  zmd  pLdi3t{Mp,  Lg) 
is  the  distance  from  point  Mp  to  line  Lg.  For  each  pair  of  parallel  Unes,  we  further  compute 
their  axis  of  symmetry  VL{6ij,  Mij),  which  satisfies 

li  X  6i  Ij  X  6j 

h - tTo — 


+  (42) 

where  6ij  is  the  orientation  of  the  symmetry  axis,  M,j  is  a  point  on  the  symmetry  axis,  and 
/,•  is  the  length  of  the  line  segr'.ent.  We  then  define  the  overlap  between  Li  and  Lj  as 

_ 1 IT  r  \  _  (  An\ 


Overlap{Li,  Lj)  = 


2.0  X  lij 


where  is  the  distance  between  pi  and  p4  in  Figure  9.  In  our  implementation,  only  line 
pairs  with  overlaps  greater  than  0.5  are  retained  as  valid  parallel  line  pairs. 


3.5.4.  Perceptual  Grouping  and  Hypothesis  Generation 

With  primitive  features  extracted,  we  detect  possible  locations  of  the  target.  For  each 
primary  feature  (ellipse  candidate,  C*),  the  following  constraints  are  used  to  search  for 
supporting  secondary  features  (parallel  line  pairs,  Paro,j’s): 

1.  max{pp.dist{Ok,Ps),PP-dist{Ok,Pe))  €  (sin  a  x  Sc  x  ha^,  sin  a  x  Sc  x  hoax)  ? 

2.  modi\d(m;o:)  -  (§  +  7)1, 21^)  <  ee  ? 

where  6{MijOk)  is  the  direction  from  M,j  in  (42)  to  the  center  of  Cjt,  as  shown  in  Figure  10, 
and  pp-dist(Pi,  P2)  is  the  distance  between  points  Pi  and  P2.  If  a  grouping  passes  the  tests, 
we  evaluate  the  quality  of  the  grouping  by  computing 


H{Ck,Paraij)  =  x  Hi{Ck,Paraij) 
1=1 


where 


Wi  =  W2 


H.(CA,Para,,)  =  (46) 

(47) 

^widtk 

rr  (r  p„r.n.  _  min(pp-dtst(Ofc,  Pe), pp-dist{Ok,  Ps))  ,  . 

^  *’•'  max{pp.di$t{Ok,Pe),pp.dist{Ok,Ps)) 

If  H{Ck,  Parcij)  is  less  than  a  threshold,  an  hypothesis  is  formed  that  there  is  a  cylindrical 
object  located  at  the  corresponding  position. 


46 


s 

s 

Figure  10:  Perceptual  grouping 

3.5.5.  Hypothesis  Verification 

The  hypotheses  are  then  verified  by  checking  for  more  support  from  the  original  edge  map, 
shadow  information,  and  intensity  distribution.  The  following  three  tests  are  used  in  cylin¬ 
drical  object  detection. 

1.  Model  inversion  test:  For  each  candidate  cylinder,  we  fit  a  model  and  check  its 
consistency  with  the  original  edge  map.  K  the  support  is  above  a  threshold,  we  accept 
it  as  a  valid  cylinder.  Otherwise,  we  continue  with  additional  tests. 

2.  Shadow  test:  Since  the  illumination  direction  is  available  from  the  site  model,  we 
delineate  a  region  where  the  shadow  of  the  proposed  cylinder  might  appear.  If  we  find 
a  supporting  shadow  (a  homogeneously  dark  region)  bounded  by  a  pair  of  parallel  lines 
within  the  region,  the  hypothesis  is  accepted. 

3.  Homogeneity  test:  We  can  also  check  the  intensity  variations  within  the  ellipse  and 
the  region  bounded  by  the  parallel  lines.  If  these  variations  are  much  smaller  than  the 
intensity  variation  in  the  image,  we  accept  the  hypothesis. 

Once  a  hypothesis  passes  the  above  tests,  the  detected  cylinder  is  reported  to  the  lA. 

3.5.6.  An  Example:  Chimney  Detection 

In  Figure  11  an  example  of  cylindrical  object  detection  is  shown:  (a)  a  new  image;  (b)  the 
region  of  interest  delineated  using  the  site  model;  (c)  the  results  of  edge  detection;  (d)  the 
edge  map  after  line  linking;  (e)  cylinders  detected  in  the  new  image;  (f)  an  earlier  image 
of  the  same  site  (the  old  image  has  been  registered  to  the  coordinates  of  the  new  image); 
(g)  the  results  of  cylindrical  object  detection  when  the  same  procedure  is  applied  to  the 
earlier  image;  and  (h)  the  registration  of  the  cylindrical  objects  detected  in  both  images. 
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(c)  Edge  detection  (d)  Edge  map  after  line  linking 

Figure  11:  New  construction  detection 
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(e)  Cylinders  detected  in  the  new  image 


(f)  An  earlier  image  of  the  same  site 


(g)  Cylinders  detected  in  the  eairlier  image  (h)  Change  analysis 


Figure  11:  (cont.)  New  construction  detection 
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Three  cylinders  are  detected  in  the  new  image  and  two  in  the  earlier  image.  A  report  that 
there  is  a  new  cylindrical  object  (the  middle  one  in  (e))  buUt  since  the  last  time  the  site  was 
investigated  is  sent  to  the  lA.  The  location  of  the  new  cylindrical  object  can  be  highlighted 
(not  shown  here). 

3.6.  Detecting  and  Counting  Vehicles 

The  subsystem  for  carrying  out  another  site  model  supported  image  monitoring  task,  de¬ 
tecting  and  coimting  vehicles,  is  reported  in  this  section.  In  our  implementation,  vehicles 
are  modeled  as  3-D  boxes  with  width,  length  and  height  specifications.  Figure  12  shows  the 
block  diagram  of  the  subsystem  for  detecting  and  coimting  vehicles.  As  shown  in  Figure  12, 
3-D  object  model  and  site  information  (camera  model,  illuminant,  etc.)  are  used  through 
out  the  procedure.  Details  of  the  implementation  are  discussed  as  follows. 

3.6.1.  Edge  Detection: 

The  modified  Canny  edge  detector  [15]  is  used  for  detection  of  edges  and  their  gradient 
directions.  Let  H  be  a  mask  and  define  its  inner  product  with  an  image  U  at  location  (m,  n) 
as 

<  U,  H  >=  ^  h(i,j)u{i  -f  m,  j  -F  n)  =  u(m,  n)  h{—m,  —n) 

«■  } 

Two  mutually  orthogonal  masks,  Hi  and  Ha,  axe  used  in  our  implementation.  Let 

gi(m,n)  =  <U,Hi> 

<?2(m,n)  =  <U,H2> 

then  the  magnitude  and  direction  of  the  gradient  vector  are 

g{m,n)  =  y/giim,n)  +  gl{m,n) 

dg(m,n)  =  arctan^-) - f 

9i{m,n) 


3.6.2.  Vehicle  Representation 

A  vehicle  is  modeled  eis  a  3-D  box  characterized  by  the  following  parameters: 

•  Width:  wsdj 

•  Length: 

•  Height:  hsd] 

•  Center:  (xcj/c^c); 

•  Rotation  Matrix:  a  3  x  3  matrix  describes  the  orientation  of  the  local  coordinate  frame. 
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Figure  12:  Flowchart  for  veh: 
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detection 


With  the  camera  model  given,  we  compute  the  2-D  projection  of  the  3-D  model.  Since  the 
height  of  a  vehicle  is  normally  shorter  than  its  length  and  width,  in  addition,  images  are 
taken  from  above,  the  vertical  contour  of  the  vehicle  is  neglegible  and  the  vehicle  can  be 
very  well  approximated  by  a  2-D  rectangle.  In  our  implementation,  we  search  for  rectangles 
to  locate  candidate  vehicles. 

3.6.3.  Search  Scheme 

A  modified  Generalized  Hough  transform  (GHT)  is  used  to  locate  possible  vehicles  (by 
extracting  the  centers  of  candidate  rectangles).  The  basic  idea  is  to  vote  the  possible  loci  of 
reference  points  from  the  detected  edge  points. 

In  our  case,  the  reference  point  is  the  center  of  the  rectangle.  For  each  edge  point,  we 
also  computed  its  gradient  direction.  The  location  of  the  reference  point  is  represented  as 
a  function  of  the  gradient  direction.  All  such  locations,  indexed  by  gradient  direction,  are 
precomputed  to  form  a  table,  see  Table  5.  The  relevant  geometry  used  to  form  the  table  is 
showed  in  Figure  13.  The  searching  algorithm  is  described  as  follows. 


Table  5:  Indexed  table  for  reference  points 


Gradient  direction 

Set  of  radii  r*  where 

of  edge  points 

►i 

II 

<h. 

2>  •  •  • 

<j>2 

<h 

_3  _3  _3 

'  IV  2»  •  •  •  > 

n-1  _n-l  «n-l 

Tj  ,r2  , . . . , 

••Ti  — Ti 

Step  1  -  Make  a  table  for  the  rectangle  to  be  located. 

Step  2  -  Create  an  accumulator  array  of  possible  reference  points,  A(xnun  :  Xjaax',yiom  •  yxoax), 
and  initialize  it  to  zero. 

Step  3  -  Compute  (f>{x)  for  each  edge  pixel  amd  vote  for  possible  center  of  am  associated 
rectangle  at 


Xc  =  x  +  r{<l>)  cos[a(^)] 
yc  =  y -I- r(^)  cos[a(<^)] 

Step  4  -  A  candidate  vehicle  is  formed  for  each  cemdidate  center  whose  vote  is  above  a  thresh- 
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Figure  13:  Geometry  used  to  computed  r^erence  point 

3.6.4.  Hypothesis  Generation 

For  each  candidate  rectangle  obtained,  we  generate  a  3-D  vehicle  and  compute  its  contours 
in  the  image.  We  then  compare  the  contoiirs  with  the  local  edge  map.  If  the  match  is  above 
a  threshold,  a  hypothesis  that  there  is  a  vehicle  at  the  corresponding  location  is  formed.  In 
our  implementation,  a  rubber-band  rectangle  template  is  used  to  evaluate  the  matching.  A 
rubber-band  rectangle  template  is  similar  to  a  rectangle  template  with  a  tolerance  band  but 
guarantees  that  each  pixel  on  the  template  can  get  no  more  than  one  vote  from  the  pixels 
on  the  edge  map  along  the  perpendicular  direction.  We  check  not  only  the  overall  matching, 
but  also  the  degree  of  matching  on  the  boundaries  in  directions  along  and  perpendiciilar  to 
the  vehicle  direction.  Therefore,  to  be  qualified  as  a  vehicle,  the  candidate  rectangle  has  to 
have  (almost)  complete  boimdaries  on  both  parallel  sides. 

3.6.5.  Hypothesis  Verification 

The  image  of  a  3-D  vehicle  is  different  from  a  2-D  rectangle  in  that  there  should  be  a 
shadow  associated  with  the  detected  rectangle.  Using  the  illuminant  model  available  from 
the  site  model,  we  can  form  a  h}rpothesis  about  the  associated  shadow  region,  which  includes 
constrmnts  on  the  position,  size,  intensity,  and  shape  of  the  shadow  region.  We  then  detect 
a  shadow  next  to  the  candidate  vehicle.  If  the  detected  shadow  is  consistent  with  the 
prediction  from  the  illuminant  model,  the  candidate  vehicle  is  confirmed.  This  hypothesis 
verification  method  is  still  under  development.  With  improvements  in  image  resolution 
and  availability  of  more  accurate  vehicle  models,  we  plan  to  develop  a  more  sophisticated 
verification  mechanism.  The  vehicle  detection  results  presented  in  this  section  are  obtained 
before  verification. 
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3.6.6.  Experiments 


Vehicle  detection  in  a  parking  area:  In  Figure  14,  an  example  of  vehicle  detection  in  a 
parking  area  is  shown:  (a)  an  image  to  be  exploited;  (b)  the  area  corresponding  to  the  garage 
of  interest,  delineated  from  the  region  information  in  the  site  model;  (c)  a  zoom-in  view  of 
the  region  of  interest;  (d)  the  detected  vehicles.  For  vehicle  detection  in  the  parking  area, 
we  used  information  about  the  garage  orientation  to  constrain  the  possible  vehicle  parking 
direction.  In  this  case,  a  report  of  '^the  garage  is  about  half  fulF  was  sent  to  the  lA. 

Vehicle  detection  on  roads:  In  Figure  15,  an  example  of  monitoring  vehicles  on  roads 
specified  by  the  lA  (through  a  QL  profile)  is  shown:  (a)  an  input  image,  (b)  a  window  to 
be  monitored,  (c)  the  area  corresponding  to  roads  of  interest.  Since  vehicles  drive  along 
the  road  direction,  the  directions  of  the  roads  are  also  generated  and  used  as  an  additional 
constraint  for  vehicle  detection.  In  the  algorithm,  only  candidates  whose  orientations  are 
approximately  along  the  road  direction  are  considered  to  be  valid  vehicles  on  the  roads. 
Finally,  in  (d)  the  detected  vehicles  are  shown. 

Vehicle  detection  on  a  training  ground:  In  Figure  16,  an  example  of  vehicle  detection 
in  a  training  ground  is  shown:  (a)  an  input  image;  (b)  a  window  to  be  monitored;  (c)  the  area 
corresponding  to  the  training  ground  which  is  of  intelligent  interest;  and  (d)  the  detected 
vehicles.  For  vehicle  detection  in  a  training  ground  (since  vehicles  can  be  oriented  in  any 
direction),  we  have  to  detect  possible  vehicles  in  all  directions. 

3.7.  Ground  Plane  Image-to-Image  Registration 

Image- to-image  registration  is  required  in  the  following  situations:  (1)  It  is  important  for 
setup  of  an  initial  site  model,  especially  when  no  ground  control  points  are  available.  Image- 
to-image  registration  can  provide  3-D  coordinates  of  some  control  points  through  triangula¬ 
tion.  (2)  It  is  critical  for  automatic  registration  of  new  images  into  an  existing  site  model. 
From  the  site  model  we  may  have  3-D  coordinates  of  some  feature  points;  to  locate  the  im¬ 
age  plane  positions  of  these  feature  points  we  need  an  image-to-image  registration  algorithm. 

(3)  For  generating  2-D  region  delineations  corresponding  to  arbitrary  viewing  directions.  In 
[18],  Zheng  and  Chellappa  developed  an  automatic  image-to-image  registration  technique 
for  n«idir  images.  The  work  was  later  extended  to  automatic  registration  of  oblique  images 

[4] .  On  the  RADIUS  project,  partial  knowledge  about  the  cameras  is  available.  We  have 
developed  algorithms  which  use  the  partially  known  camera  parameters  to  perform  image 
registration  efficiently.  Details  of  the  image-to-image  registration  technique  are  reported  in 
this  section. 
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(c)  A  zoom-in  view  of  the  region  of  interest  (d)  Vehicles  detected  in  the  parking  area 


Figure  14:  Vehicle  detection  in  a  parking  area 
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(c)  Region  of  interest  (d)  Vehicle  detection 


Figure  15:  Vehicle  detection  on  communication  roads 
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(a)  A  new  image  (b)  The  window  to  be  monitored 


■-W 


(c)  Region  of  interest  (d)  Vehicles  detection 

Figure  16:  Vehicle  detection  in  a  training  ground 
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3.7.1.  Relationship  between  Two  Images 

Assuming  we  have  two  sets  of  camera  parameters,  a  point  {xw,yw,^viy  iii  the  world  coordi¬ 
nates  is  represented  in  the  two  camera  centered  coordinate  systems  by 

(Xj  \  /  Xiu  Xio  \  /  rjj  rj3  \  /  Xy,  Xio  \ 

yi  =  y^-yio  1  =  1  »*2i  ^23  s/w  -  yio  (49) 

X\  )  \  Z\o  j  \  Tji  r32  j  \  2u;  Zjo  / 


and 

(X2  \  .  /  Xyj-X2o\  (  rJi  rjj  \  /  x„  -  I20  '' 

yz  =  ®-2  ytv-y2o  =  rh  rl^  I  yzo 

Z2  /  \  Zw- 220  )  \  rli  r|j  r|3  /  \  z^-  Z20  / 

respectively.  So  the  transform  from  (xi,yi,zi)*  to  {x2yy2,Z2y  is 
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Note  that  for  points  on  the  ground  plane  we  have  z,t,  =  D,  a.  constant,  so  that 
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Assume  f\  and  /j  are  the  focal  lengths  of  camera-1  and  camera-2,  Ci  and  62  are  the  pixel 
spacings  for  image-1  and  image-2,  and  (Xi,  Vi)*  and  {X2,  ^2)*  are  the  image  plane  coordinates 
for  image- 1  and  image-2,  respectively;  then  using  central  projection  we  have 


or 


For  image-2  we  have 
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So  the  ground  plane  transform  from  a  pixel  (^i,yi)  in  image- 1  to  image-2  is  given  by 
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3.7.2.  Registration  Using  Known  Camera  Parameters 

When  the  camera  parameters  are  available,  we  can  register  the  ground  planes  of  any  two 
images  using  (60-70).  In  Figure  17,  the  registration  of  two  oblique  images  of  different 
resolution  is  shown:  (a)  a  high  resolution  model  board  image.  Ml,  with  Ground  Space 
Distance  (GSD)  equal  to  15  inch,  a  =  30.7®,  ^  =  339.2®,  and  7  =  197.6®;  (b)  a  low  resolution 
model  board  image,  M40,  with  GSD=26  inch,  a  =  44.7®,  ^  =  254.8®,  and  7  =  340.4®;  (c) 
the  registration  of  M40  to  Ml;  and  (d)  the  registration  of  Ml  to  M40. 


3.7.3.  Registration  with  Unknown  Camera  Parameters 


When  no  information  about  the  camera  is  available,  we  still  can  register  two  oblique  images 
by  automatically  matching  (at  least)  four  corresponding  points  and  solving  for  the  transform 
parameters  in  (60-61).  For  the  principal  point  of  image-1  we  have  (Xi,yi)  =  (0,0);  its 
corresponding  location  in  the  coordinates  of  image-2  is  (^2,12)  =  (§>  g)-  ^ 

two  Ccimeras  are  well  above  the  ground,  the  principal  point  of  image- 1  must  be  a  well-defined 
point  (finite)  in  the  coordinates  of  image-2.  Hence  G  ^  0.  The  ground  plane  transformation 
of  image- 1  to  image-2  can  be  determined  in  terms  of  eight  parameters  a,-,  i  =  1, . . . ,  8  as 
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(a)  Input  image-1  (Ml) 


(b)  Input  image-2  (M40) 

Figure  17:  Registration  of  two  oblique  images  (camera  parameters  are  known). 
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(d)  Ml  registered  to  M40 


Figure  17:  (cont.)  Registration  of  two  oblique  images  (camera  parameters  are  known). 
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When  camera  parameters  are  not  available,  the  eight  parameters  are  obtained  by  solving 
the  linear  equations 

Og  +Xg,a3  +  yg,as  + A^g,,Y2tfl7+ VgjX2,a8=X2i  (73) 

<l2  +  -^l«<*4  +  5^«a6  +  -X^l«l2»07  +  Vg,-5^ia8=V2»  (74) 

for  i  =  1, . . . ,  TV,  where  N  is  the  number  of  matched  points. 

Overview  of  the  registration  algorithm:  Figure  18  illustrates  the  image  registration 
algorithm.  Given  two  images,  we  first  use  an  illuminant  direction  estimator  [17, 18]  to  get  an 
initial  estimate  of  the  camera  orientation  change.  A  small  number  of  feature  points  are  then 
located  using  a  Gabor  wavelet  model  for  detecting  local  curvature  discontinuities  [9].  The 
feature  points  extracted  from  different  frames  axe  matched  using  area  correlation.  Three 
match  verification  tests  are  used  to  exclude  false  matches.  After  the  initial  matching  is 
achieved,  a  multiresolution  transform-and-correct  matching  is  implemented  to  obtain  high 
accuracy  registration.  At  each  resolution,  image-2  is  first  transformed  to  the  coordinates  of 
image- 1  using  the  estimated  matching  parameters  and  then  match  refinement  is  performed 
on  the  feature  points  extracted  in  image- 1. 

Feature  point  detection:  For  feature  point  extraction  we  use  a  Gabor  wavelet  decom¬ 
position  and  the  local  scale  interaction  based  algorithm  reported  in  [9].  The  basic  wavelet 
function  used  in  the  decomposition  is  of  the  form 

$(X,y,d)  =  (75) 

X'  =  Xcost^-l-ysini? 

Y'  =  —X  sin  T?  -H  y  cos 

where  t9  is  the  preferred  spatial  orientation.  In  our  experiments  i?  is  discretized  into  four 
orientations.  The  feature  points  are  extracted  as  the  local  maxima  of  the  energy  measure 

I{X,  y)=max{||W0.  {X,  Y,  Y,  ,?)||}  (76) 
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where 


W0(A',r,t?)  =  f(g)$(2-iX,2-iK,t?),  j  =  {juj2h 

Here  ji  and  j2  two  dilation  parameters,  and  7  =  is  a  normalizing  factor.  In 

implementing  the  above  algorithm,  we  further  require  the  energy  measure  for  a  feature  point 
to  be  the  maximum  in  a  neighborhood  with  radius  equal  to  10  and  above  a  threshold. 

Match  verification:  In  our  algorithm,  the  initial  matching  is  implemented  on  2-D  rotation 
compensated  images.  Since  no  further  knowledge  about  the  camera  parameters  is  used  in 
the  initial  matching,  false  matches  due  to  perspective  deformation  and  similarities  between 
similar  objects  are  inevitable.  Automatic  exclusion  of  these  false  matches  is  a  key  to  success 
in  image  registration.  We  have  used  three  tests  to  exclude  less  reliable  matches. 

1.  Distance  test:  The  translation  between  the  rotation-compensated  images  should  not 
be  larger  than  a  certain  fraction  of  the  image  size.  A  '.id  matching  pair,  (Ar,  Yr)  and 
(A/,  VI),  should  satisfy 

d,  =  \Xr-X,\  <  XL, 

<  dy  =  \Yr-Y,\  <  XLy  (77) 

IX  -  XI  +  \Yt  -  Yi\  <  Kmax{ix,-f^»} 

For  example,  A  =  5  and  k  —  |A.  L,  and  Ly  are  image  size  along  x  and  y  directions 
respectively. 

2.  Variation  test:  The  translations  used  in  the  correct  matches  should  support  each 
other,  i.e. 

\di  -  d|  <  (78) 

where  d,-  is  the  distance  between  the  matching  pair,  d  and  a  are  the  mean  £md  stan¬ 
dard  deviation  of  the  distances  for  all  the  matched  feature  pairs,  and  is  a  threshold, 
for  example  fi  =  y/Z  for  the  uniform  distribution. 

3.  Outlier  exclusion:  The  matched  feature  pairs  should  satisfy  the  image  transform 
model.  Candidate  matching  pairs  with  large  residual  errors  should  be  excluded.  This 
test  also  helps  to  exclude  matches  on  building  roofs,  etc. 

Experimental  results:  In  Figures  19  and  20,  the  registration  of  two  aerial  images  is 
shown:  (a)  the  image  taken  by  the  first  camera;  (b)  the  image  teJsen  by  the  second  camera; 
(c)  the  registration  of  (b)  to  (a);  and  (d)  the  difference  between  (a)  and  (c). 

4.  Ongoing  and  Future  Work 

4.1.  Hierarchical  Model-Based  Segmentation 

We  are  developing  a  general  model-based  procedure  for  image  segmentation  based  on  a 
hieraurchical  connected  component  analysis.  This  method  will  be  useful  for  detection  and 
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(c)  Registration  of  (b)  to  (a)  (d)  Difference  between  (a)  and  (c) 

Figure  19:  Registration  of  two  aerial  images  ( Example- 1) 
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(c)  Registration  of  (b)  to  (a)  (d)  Difference  between  (a)  and  (c) 

Figure  20:  Registration  of  two  aerial  images  (Example-2) 
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counting  as  well  as  for  change  detection  based  on  comparing  the  components  of  the  seg¬ 
mentation  algorithm.  This  multi-level  segmentation  is  used  as  a  search  space  for  various 
complex  objects.  The  hierarchical  connected  component  analysis  procedure  consists  of  a 
multi-stage,  region-growing  type  of  segmentation.  The  initial  stage  is  the  result  of  an  initial 
segmentation  of  the  image  into  connected  components.  In  our  implementation,  two  adjacent 
pixels  are  considered  to  be  coimected  if  the  difference  in  their  gray  level  values  is  less  than 
a  threshold  e.  Each  successive  stage  merges  adjacent  components  (or  regions)  of  the  pre¬ 
vious  stage.  The  selection  of  the  regions  to  be  merged  is  based  on  local  analysis  of  region 
properties.  Currently,  only  average  boundary  contrast  is  used.  The  new  stage  represents  a 
co£u:ser  segmentation  of  the  image.  A  complete  hierarchy  is  built,  i.e.,  the  merging  process 
ends  when  there  are  no  more  regions  to  merge. 

The  hierarchy  is  used  as  a  search  space  for  diverse  objects.  Currently,  a  model  for  an 
object  of  interest  is  interactively  created.  The  model  includes  various  distinctive  elements 
of  the  object  and  geometric  and  topological  relations  among  them.  The  search  process  tests 
for  the  presence  of  these  elements  at  several  levels  of  the  hierarchy.  It  is  expected  that  intact 
or  nearly  intact  elements  of  the  object  appear  at  coarser  levels,  thus  allowing  us  to  find  the 
object  using  minimal  search. 

The  general  paradigm  for  extracting  candidate  objects  from  an  image  is  the  following: 

Locale  specification:  A  locale  in  the  image  is  selected  to  start  the  process  of  finding 
object  candidates.  A  locale  is  defined  with  respect  to  known  objects  or  it  can  correspond  to 
the  whole  image.  The  set  of  basic  connected  components  that  lie  within  a  locale  is  called 
the  basis  Bq. 

Segmentation:  As  previously  mentioned,  the  segmentation  is  a  simple  gray-level  con¬ 
nected  components  algorithm.  The  result  is  a  labeled  image,  in  which  each  connected  com¬ 
ponent  is  assigned  a  unique  value.  These  connected  components  will  be  referred  to  as  basic 
components.  A  characteristic  of  this  set  is  that  any  boundary  between  any  two  components 
has  an  average  contrast  greater  than  the  threshold  e. 

Next,  a  region  adjacency  graph  (RAG)  is  constructed  from  the  basic  component  set.  This 
RAG  will  be  referred  to  as  RAG(O).  The  parameter  0  indicates  that  it  is  the  initial  RAG 
of  the  hierarchy,  which  is  computed  next.  Several  properties  of  each  region  axe  computed; 
they  include  area,  perimeter,  boundary  average  intensity  contrast,  etc.  In  parallel,  a  list  of 
boimdaries  is  computed  from  RAG(O),  and  it  is  sorted  in  increasing  order  of  the  boundaries’ 
average  intensity  contrast. 

Hierarchy  of  segments:  Starting  from  RAG(O),  the  hierarchy  consists  of  the  sequence 
of  adjacency  graphs  RAG(O),  RAG(l),...,  RAG(t),...,  RAG(n).  Each  RAG(i)  is  formed 
by  merging  the  regions  whose  common  boundary  has  minimum  average  contrast  (CONT(i)) 
in  RAG(i  —  1).  Therefore,  any  boundary  in  RAG(i)  has  average  contrast  greater  than 
C0NT(2).  The  minimum  contrast  boundaries  at  each  stage  are  located  through  the  precom¬ 
puted  boundary  list.  This  list  is  updated  after  merging  regions  (i.e.,  after  creating  RAG(i)), 
since  new  edges  are  created,  some  become  redundant  and  the  ones  with  contrast  equal  to 
CONT(i)  disappear. 
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A  unique  symbolic  representation  is  maintained  for  each  r^on  at  each  level  of  the  hi¬ 
erarchy.  Let  r,j  be  the  region  (in  some  arbitrary  ord^)  in  RAG(t).  Each  region  r,j  in 

*  RAG(t)  has  two  kinds  of  link:  (1)  a  link  to  each  of  the  regions  {pkj  |  k  <  t  and  pkj  is  a 

»  component  of  rij}  and  (2)  a  link  to  region  tm,n)  where  m  >  i  and  r,j  is  a  component  of 

.  tm,n‘  For  all  j,  the  first  link  of  roj  is  NULL;  if  some  r,j  is  not  a  component  of  any  region, 

its  second  link  is  NULL. 

The  hierarchy  can  be  viewed  abstractly  as  a  tree,  where  each  node  in  the  tree  is  a  region  (a 
basic  component  or  a  multiple  basic  component  region).  The  lowest  level  corresponds  to  the 
basic  components.  Given  that  there  are  n  basic  components  obtained  from  a  segmentation 
using  threshold  e,  there  will  be  at  most  2n  nodes  in  the  tree. 

Search:  The  hierarchy  is  the  basis  for  extracting  information  during  the  search  process. 
The  search  elements  are  regions  (2D  structures).  The  search  procedure  initially  looks  for 
a  basis  (a  level  in  the  hierarchy)  that  includes  at  least  a  seed.  A  seed  is  a  region  that 
satisfies  necessary  conditions,  specified  by  the  model.  A  seed  is  preferably  chosen  high  in  the 
hierarchy  since  it  is  desired  that  complete  objects  be  found  as  eairly  in  the  search  as  possible 
(top-down  approach). 

Next,  search  looks  for  combinations  of  regions  that  satisfy  the  conditions  expressed  by 
the  model.  It  is  guided  by  predefined  search  heuristics.  The  purpose  of  the  heuristic  search 
is  to  systematically  order  the  search  space  in  order  to  attain  a  complete,  yet  efficient,  search. 
The  final  output  is  a  list  of  object  candidates. 

4.2.  Automatic  Image-to>Site-Model  Registration 

Currently,  image-to-site-model  registration  reqtiires  that  the  LA  manually  select  and  adjust 
several  control  points  whose  3-D  coordinates  in  the  world  coordinate  system  are  known.  On 
the  RADIUS  project,  it  is  assumed  that  approximate  camera  parameters  are  available.  We 
are  developing  two  automatic  image-to-site-model  registration  algorithms.  When  approxi¬ 
mate  3-D  coordinates  of  the  camera  stare  point  are  available,  we  will  use  an  image-to-image 
registration  algorithm  to  automatically  search  for  the  image  domain  locations  of  control 
points  whose  3-D  coordinates  are  available  from  the  site  model  and  perform  camera  resec¬ 
tion  to  get  an  accurate  camera  model  for  the  newly  acquired  image.  When  the  camera  stare 
point  is  unknown,  even  with  given  approximate  camera  orientation  information,  the  displace¬ 
ment  between  the  new  image  and  the  projected  world  coordinates  can  be  quite  large.  We 
will  first  perform  automatic  feature  detection  to  select  a  small  set  of  feature  points  and  then 
do  image-to-image  registration  based  on  these  feature  points.  We  will  do  another  image-to- 
image  registration  to  get  the  image  domain  locations  of  a  set  of  control  points  whose  3-D 
world  coordinates  are  known.  Camera  resection  can  then  be  performed  and  an  accurate 
camera  model  for  the  new  image  can  be  obtained. 

4.3.  Automatic  Optimum  Image  Selection 

Given  a  change  monitoring  task  in  a  specific  region,  several  images  are  usually  available.  How 
to  automatically  select  the  best  images  for  the  given  monitoring  task  based  on  the  scene. 
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illiuninant  and  imaging  conditions  is  an  interesting  research  topic.  We  plan  to  develop  an 
automatic  site  analysis  algorithm  which  will  analyze  visibility,  detectability,  and  unambigu¬ 
ity,  and  will  generate  invariance  measures  for  each  feature  object.  These  measurements  will 
also  be  useful  for  automatic  control  point  selection  and  model  supported  optimization.  We 
also  plan  to  develop  a  shadow  detection  and  correction  algorithm. 

4.4.  QL  Interfaces 

Based  on  our  progress  in  using  RCDE  in  connection  with  vehicle  detection  and  construction 
monitoring,  we  are  working  with  members  of  TASC  team  and  with  some  RADIUS  users  to 
develop  more  sophisticated  QL  profiles.  This  will  include  more  sophisticated  model  based 
object  detection  algorithms  and  user  friendly  menu  and  query  driven  image  exploitation 
recipes. 

4.5.  Integration  of  Collateral  Information 

An  axivantage  of  model  supported  image  analysis  is  that  collateral  information  can  be  used 
to  improve  efficiency  and  accuracy.  The  more  collateral  information  is  used,  the  easier  the 
monitoring  tasks  become.  Currently,  collateral  information  such  as  a  region  map  is  manually 
generated  for  the  site  model.  We  plan  to  develop  a  semiautomatic  region  map  generation 
algorithm.  The  following  scenarios  will  be  considered:  (1)  When  collateral  information  is 
available  on  an  ordinary  map,  we  will  use  an  automatic  curve  tracing  algorithm  to  transfer 
the  region  ciurves  from  the  map  to  the  site  model.  (2)  When  images  taken  from  different 
types  of  sensor  are  available,  we  will  derive  regions  from  composition  of  segmentation  results 
using  images  taken  &om  an  appropriate  sensor.  For  example,  SAR  images  are  good  for 
segmentation  of  water,  concrete  structures,  and  vegetation.  (3)  Region  information  czm  also 
be  derived  from  an  associated  digital  terrain  map,  when  it  is  available.  We  will  integrate  the 
database  mamagement  facility  provided  by  the  THREAD  project  into  our  system.  We  also 
plcua  to  integrate  am  image  synthesis  capability  into  our  system.  We  will  also  investigate  the 
incorporation  of  temporal  information  into  the  monitoring  algorithm. 

5.  Other  Related  Work 

5.1.  Feature  Extraction  in  SAR  images 

The  RADIUS  project  will  benefit  by  progress  in  high  resolution  SAR  imagery  analysis  taeks 
such  ais  region  segmentation  and  taurget  detection.  Recently,  we  have  developed  a  constant 
false  alairm  rate  (CFAR)  point  target  detection  algorithm  for  high  resolution  SAR  imagery 
[16].  Traditional  CFAR  detection  algorithms  produce  mamy  false  targets  when  applied  to 
single-look,  high-resolution,  fully  polarimetric  SAR  imaiges,  due  to  the  presence  of  speckle. 
We  have  developed  a  two-stage  CFAR  detector  followed  by  conditional  dilation  for  detecting 
point  targets  in  polarimetric  SAR  images.  In  the  first  stage  possible  taurgets  are  detected,  and 
false  targets  due  to  the  speckle  are  removed  by  using  global  statistical  paraunelers.  In  the 
second  stage,  the  local  statistical  parameters  are  used  to  detect  targets  in  regions  adjacent  to 
targets  detected  in  the  first  stage.  Conditional  dilation  is  then  performed  to  recover  target 
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pixels  lost  in  second  stage  CFAR  detection.  The  performance  of  a  CFAR  detector  is  degraded 
if  201  incorrect  statistical  model  is  adopted  and  the  data  are  correlated.  A  goodness-of-fit  test 
*  is  performed  to  choose  the  appropriate  distribution,  and  the  effects  of  decorrelation  of  the 

i  data  are  considered.  Good  experimental  results  were  obtained  when  our  method  was  applied 

.  to  single-look,  high-resolution,  fully  polarimetric  SAR  images  acquired  from  Dr.  Les  Novak 

of  MIT  Lincoln  Laboratory.  We  have  also  developed  a  CFAR  detector  for  non-Gaussian 
clutter  distributions  such  as  the  K,  Weibull  and  lognormal  distributions.  This  algorithm  has 
been  tested  on  single  look,  single  polarization  SAR  images. 

5.2.  Building  Delineation 

Building  detection  is  of  interest  in  site  model  construction  and  change  monitoring.  Recently, 
we  have  developed  an  energy  function  based  approach  for  detection  of  rectangular  shapes 
in  an  image.  Our  building  detection  algorithm  is  based  on  line  grouping  [8].  The  proposed 
edge-based  approach  involves  extracting  straight  lines  from  an  edge  map  of  the  image.  Then 
a  Markov-random  field  (MRF)  is  built  on  these  lines,  i.e.,  a  suitable  neighborhood  and  an 
energy  function  2ire  specified  based  on  the  relative  orientations  and  spatial  locations  of  the 
lines.  This  energy  function  can  be  construed  as  a  measure  of  the  conditional  probability 
of  observing  the  lines  given  the  rectangular  shapes  (the  positions  and  number  of  which  are 
unknown)  in  the  image.  Minimizing  the  energy  function  is  equivalent  to  selecting  maximum 
likelihood  estimates  of  the  rectangular  shapes  in  the  image  from  the  observed  lines.  Simu¬ 
lated  examples  are  presented  to  demonstrate  the  robustness  of  the  proposed  method.  This 
approach,  supplemented  with  some  qualitative  information  about  shadows  and  gradients, 
has  been  used  to  detect  rectangular  buildings  in  real  aerial  images.  Due  to  the  poor  quedity 
of  the  real  images,  only  partial  shapes  are  extracted  in  some  cases.  A  modified  deformable 
contour  (“snakes”)  based  approach  is  then  used  for  completion  of  the  partial  shapes. 

6.  Summary  and  Conclusions 

At  the  end  of  the  first  year  of  the  RADIUS  project,  we  have  made  considerable  progress 
on  mastering  RCDE,  developed  some  prototypes  of  QL  profiles  for  imagery  monitoring,  and 
transferred  some  of  ovir  results  to  Martin  Marietta.  Based  on  our  experience  during  the  first 
year,  we  have  made  research  plans  for  two  up-coming  years  of  the  RADIUS  project. 
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